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Foreword 


The International Conference on Construction Applications of Virtual Reality (CONVR), as one of the world's 
leading conferences in the areas of immersive realities and digital transformation in AECO Industry, and the 
local organizing committee are pleased to present the Proceedings of the 23"! International Conference on 
Construction Applications of Virtual Reality (CONVR 2023) with the overarching theme "MANAGING THE 
DIGITAL TRANSFORMATION OF CONSTRUCTION INDUSTRY". 


The 23 CONVR was held on November 13-15, 2023, in Florence, Italy and was proudly hosted by the 
Department of Architecture of the University of Florence. 


CONVR 2023 brought together AECO researchers and practitioners from around the globe to report on and 
exchange the latest development, ideas, improvements and applications stemming from innovative research 
activities in the following fields: Virtual Reality (VR) and Augmented Reality (AR), Reality capture and 
Photogrammetry, H-BIM for heritage management, Simulation and Automation techniques, Computer Vision 
and Image Processing, Linked Data and Semantic Web for Knowledge Management, Smart Contracts, 
Distributed Ledger Technologies and Blockchain, Data Science, Machine Learning, and Data-Driven 
Approaches, Health & Safety, Green and smart buildings, Occupant-centric building design and operation, 
Building Information Modeling (BIM), Digital Twins, Internet of Everything, Mobile and wearable computing, 
Construction site management. Those topics were articulated in eight different areas: Methodology, 
Technology transfer, Technology, State of Art, Theoretical Study, Policy and Standardization, Education and 
Training, Case Study and Application. 


A total of 123 high-quality contributions were accepted after a rigorous review process from 71 esteemed 
members of the conference’s International Scientific Committee. The accepted papers include a total of 374 
authors from 32 countries, from Europe, the Americas, Asia and the Middle East. 


More than 150 experts attended the conference contributing to enriching the exciting program which included 
6 keynote speeches on the first day and 4 parallel presentation sessions on the following days, together with 5 
workshop sessions. 


The editors trust that this publication is stimulating and inspiring for academics, scholars and industry experts 
in the field; hoping that this could be a driving force for innovation, growth and global collaborations among 
researchers and stakeholders. We believe in the significant role that human interactions, networks, knowledge 
exchange and transfer play in developing high-value and groundbreaking research. This event provides a 
platform for networking and intellectual exchange of ideas. 


We take this opportunity to express our gratitude to the CONVR2023 Technical Organizing Committee as 
well as our esteemed reviewers and sponsors. The creation of such a broad and high-quality conference 
program would not have been possible without their involvement and support. We also thank all the authors 
who dedicated much of their time and efforts to contribute to CONVR2023. We extend our best wishes to you 
and look forward to seeing you next year for CONVR2024. 


CONVR2023 Local Chairs 


Prof. Pietro Capone Dr. Vito Getuli 
Conference Chair Chair of the International Scientific Committee 


> 


Papau BÇ 
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INVESTIGATION OF THE ACCEPTANCE OF VIRTUAL REALITY FOR 
PLANNING DECISIONS IN EARLY DESIGN PHASES 


Daniel Napps & Markus König 
Ruhr University Bochum, Germany 


ABSTRACT: In recent years, with the increasing digitization of the construction industry, the potential benefits 
provided by the adoption of Virtual Reality (VR) have been shown especially in interdisciplinary networking among 
different stakeholders for which effective communication and information exchange methods are crucial. This is 
particularly significant during early design phases and associated decision-making processes where, despite its 
positive impact in terms of projects time and cost savings, VR adoption still has to reach its full potential. For this 
reason, this paper investigates to what extent the acceptance and application of VR has developed and identifies 
possible integrations in the early planning phases. By conducting a multi-year study with representatives from the 
construction industry, including qualitative and quantitative survey methods, the current use of VR and the 
requirements for future applications are determined. The study reveals that VR's importance for design 
visualization has increased, identifies architects' current requirements and integration barriers. Additionally, these 
requirements are compared with existing VR possibilities and an approach for exchanging different variants in a 
building information model will be examined. Based on these findings, VR can be integrated in application-specific 
contexts and software can be adapted to architects' needs for optimizing the digitization process. 


KEYWORDS: Virtual Reality, Decision Support, Early Design Phase, Design Visualization, Study 


1. INTRODUCTION 


Virtual Reality (VR) has made significant advancements in recent years and is increasingly being employed across 
various industries to foster innovative solutions. In the construction industry, VR presents novel opportunities and 
revolutionizes the approach to project planning, design, and realization (Ozcan-Deniz, 2019). For designers, 
architects and clients, one of the biggest challenges in the construction industry is the difficulty in visualizing a 
building project in the early design phase. Traditional methods using 2D drawings and 3D models have inherent 
limitations in conveying an accurate representation or illusory stimuli (Paes et al., 2017). At this point, VR becomes 
an important factor. By using VR technology, users are able to enter an immersive virtual environment and 
experience the future building or infrastructure in 3D. Architects are able to visualize their designs in virtual 
environments, which enables them to improve their understanding of proportions, designs, and functionalities. 
Developers have the ability to take virtual tours of their future properties, giving them a realistic feel for the space 
and amenities. Moreover, VR enables improved collaboration between the various stakeholders in a construction 
project (Davila Delgado et al., 2020). This allows architects, engineers, developers and other stakeholders to 
collaborate together in the Virtual Reality environment, make modifications, identify problems and develop 
solutions even before the actual construction process begins. Besides saving time and money, this may also reduce 
potential errors and miscommunications. 


There are different types of VR technologies, each offering unique approaches to allow immersion in virtual worlds. 
Tethered VR systems are powerful solutions that require a connection to a computer. They incorporate external 
sensors or cameras to track the user's movements and ensure precise interaction in the virtual environment (Casini, 
2022). Established VR companies such as High Tech Computer Corporation (HTC), META, Valve and Windows 
are already indicating strong growth in the segment over the next few years (Steam, 2022). Standalone VR devices 
are autonomous items that do not require a connection to an external computer. Instead, they integrate the display, 
processor, and tracking technology into a single device and offer a certain degree of mobility and ease of operation. 
By introducing the Apple VisionPro and its release in 2024, there is a potential for the market in this segment to 
grow even more in the future, as other technologies from the developer have already had a strong impact on the 
respective sales market. Mobile VR effectively uses smartphones as displays and processing power for virtual 
experiences. It allows users to plug their smartphone into a VR headset and access a wide range of VR applications 
or games (Casini, 2022). Available hardware is suitable for both private and professional applications. Unity, for 
example, is an applied development environment in architecture for the visualization of VR projects (Boeykens & 
Gawade, 2013). Due to these market trends, developing technologies and fields of application, it is relevant to 
investigate the actual potentials and evolution of VR acceptance for the construction industry in Germany. Older 
studies, such as from the United Kingdom, indicate that integration is imminent in large parts of the construction 
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industry (Bouchlaghem et al., 1996), and even more recent studies simply conclude that the AEC industry in the 
next few years is changing its previous path toward utilizing (Noghabaei et al., 2020). A study from Australia has 
adopted a similar approach to investigate the adoption of VR in the construction industry. However, this study is 
limited by a small sample size and the restriction of participants to the community of two universities in Sydney, 
Australia. As a result, there is no direct assessment of VR adoption among the stakeholders in the construction 
sector (Ghobadi & M.E. Sepasgozar, 2020). Country-specific surveys on the status of VR in the construction 
industry are rare, have been conducted many years ago, and do not include any follow-up studies. There is no 
specific data available for the construction sector in Germany. 


Despite its potential benefits, there are several challenges associated with the use of VR for business purpose. 
Implementing VR technology in the workplace may be expensive, especially the initial investment in hardware 
and software. Additionally enquires employees to train and understand how to operate the equipment and navigate 
in the virtual environments effectively (Prabhakaran et al., 2022). VR technology often collects and stores user 
data, including personal and behavioral information. Maintaining data security and privacy is crucial to protect 
sensitive information and comply with regulations. To explore these and other similar potential barriers to the 
widespread adoption of VR in the construction industry, studies have been conducted in various years involving 
different target groups in the discipline. The study aims to capture the importance of VR for design visualisation 
in Germany and will explore the current requirements of architects and barriers for integration. A specially created 
sample application serves as a visual illustration of the VR possibilities and, in addition to the findings from the 
study, investigates the exchange of different design variants in a building information model. 


The structure of this paper is as follows: Section 2 provides an overview of the research approach, presenting the 
research methodology and providing a supporting visualization for evaluation. The findings from the conducted 
studies are presented in section 3. In this regard, a categorization is carried out to investigate the potentials of VR 
on a topic-related basis. Section 4 discusses the findings and provides suggestions for future research. 


2. BACKGROUND 
2.1 VR in the construction industry 


There are many potential applications for Virtual Reality in the early design phases for construction projects. 
Besides design presentations, simulations, marketing purposes and construction site inspections the collaborative 
planning is a significant factor. Simulations, for example, can be used to test specific aspects of a project, allowing 
for customizations to be made according to the building design. Virtual simulations of sunlight exposure provide 
insights into the materiality of building components (e.g., on the facade), light reflection and consider shadows on 
surrounding buildings and streets, while rain scenarios help to capture the direct runoff of rainwater on, around, 
and adjacent to a building in the early phases. For marketing purposes, the early design phase is less relevant. 
Instead, investors or potential buyers can be shown the project or building in detail, by a digital tour. VR-based 
construction site inspections allow for the early evaluation of safety measures as well as safety training and later 
monitoring of construction progress (Zhang et al., 2022). For many of these applications, a BIM interface is 
beneficial so that modifications can be made and saved directly in the Virtual Environment. These can be than 
done in virtual spaces, which enable teams from different locations to collaborate virtually on designs and plans, 
facilitating decision-making and coordination of work processes (Jensen, 2017). 


2.2 Acceptance measurement 


In terms of the acceptance of the technology in the construction industry, there is a need to operationalize it. For 
user acceptance, various models such as David's Technology Acceptance Model (TAM) in the field of innovation 
(Davis et al., 1989) or Kollmann's Dynamic Acceptance Model which was developed to analyze the acceptance of 
innovative goods can be utilized (Geldmacher et al., 2019). These are commonly applied in the field of information 
systems technology. According to the latter, the process of acceptance is divided into three levels: 


Attitude level: This level begins with the awareness of a product, which results in the user's interest and 
expectations. 


Level of action: In the process, the user makes initial tests and experiences that can result in a purchase 
and subsequent implementation for appropriate utilization. 


Utilization level: Regular access to the product ensures its continued use. 


SECTION A - EX NOLOGIES IN CONSTRUCTION 
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These levels span from the initial purchase decision to actual usage over time. The levels are interconnected, 
meaning that successful completion of one level leads to the next, ultimately culminating in the overall acceptance 
of a product. Therefore, assessing the overall acceptance necessitates examining all these levels. In this context, 
the potentials of the VR technology have a significant impact as they can positively influence the levels and 
therefore the overall acceptance. 


2.3 Variant Management 


In the early design phases, architects have various options for designing individual parts of the building. By using 
the Variant Management, it is possible to store and retrieve different variants for a specific building component in 
a digital building model. The methodology is designed to provide architects with decision support. This decision 
support can be advantageous for a virtual and immersive assessment of a project since different options for 
construction variants can be explored. The Variant Management encompasses three types of variants: Structure 
Variants, Function Variants, and Product Variants. Structure Variants refer to building elements related to the 
structure of the building, such as exterior walls. Function Variants are components that fulfill specific functions 
(e.g., columns), while Product Variants categorize individual products (e.g., windows) (Napps et al., 2021). 


At the initial planning process, building elements must be categorized by an architect according to the three 
variants in order to ensure to find possible design alternatives later on. This categorization of elements into variants 
can either be done in a BIM software and provided with additional options (Napps et al., 2022) or based on the 
IFC data. The approach is performed in a graph-based format. Here, the exported IFC file of the model is 
transferred to a graph representation, whereby IFC entities are represented as nodes and IFC relationships as edges. 
Possible options are added to an existing variant in form of an option node, which results in the overall graph as a 
parallel possible option for an element (Napps et al., 2021). VR visualization allows architects to immersively 
experience the impact of a variant and how it compares to other stored options. If no options are stored in a model, 
problem-specific features can be specified for a variant in a separate database to obtain possible solutions from 
other projects as potential options. By choosing an option from another model, the selected option will be 
automatically stored as an option node to the graph. The selection of a design option can stem from either a 
calculation of similarity to other variants or on a specific aspect of investigation. Recently, Variant Management 
has been used to carefully evaluate design options or alternatives for cost efficiency and assessed alternatives to 
building components based on their costs to identify the optimal solution for a project (Napps et al., 2023). 


3. RESEARCH DESIGN 


The research design consists of primary research involving online surveys and expert interviews supported by a 
literature review to identify the current status and potentials of VR and its use in the early design phases. In order 
to investigate the current integration and acceptance of VR in the construction industry, a study over several years 
is conducted, as well as a determination of the acceptance of the technology among potential stakeholders. 
Acceptance has an important impact, as it is directly related to the usage. An identification of affected actors in the 
construction industry, the reduction to research-relevant stakeholders and the identification of actual potentials and 
the operationalization of acceptability is therefore necessary (Fig. 1). 


Literature review 


Identification of actors 


First Second Study Third Study 
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Fig. 1: Research design schedule. 


In the construction industry, there are multiple stakeholders working together who might benefit from cooperation 


and coordination via VR in projects. This includes primarily architects, who have to consider the planning concept, 
the design wishes and ideas based on the requirements of the clients and on legal circumstances. A planning agency 
is a service company in which civil engineers, specialist engineers and planners work together. Geographers, 
geometricians and geoinformatics potentially can come into contact with VR in the planning process, for example, 
when it comes to generating city or terrain models or data bases. Real estate developers and investors are equally 
important actors who benefit from early visualization of the project in VR for new buildings, as well as for 
renovations and modernizations. Municipalities occupy a significant position due to the facts that, on the one hand, 
they have the primary role in the planning process in Germany, and on the other hand, they are highly significant 
as contractors for architects and investors (Emmitt, 2010). Of these, smart cities are highlighted, which are 
characterized by their significantly more innovative character compared to other municipalities. Furthermore, the 
public benefits from project renderings in VR, as they are able to receive a 3D visualization of the planning through 
the public participation required by law in Germany (§ 3 Abs. 1 BauGB) and do not have to rely on a planning 
knowledge of 2D plans and views. 


Due to the limitation to the early design phases because of the increased relevance for planning decisions and error 
minimization (Ostergard et al., 2016), not all of the mentioned stakeholders are of equal importance for potential 
usage and acceptance determination. Therefore, these are to be reduced due to the observation scope. In this context, 
the decision was made to specifically target architects and also gain insights into the overall sentiment of German 
cities. This approach allows for capturing aspects from both the supply side (architects) and the demand side 
(municipalities). Volunteers from the population were also invited to take part in a survey, as the Planning Act in 
Germany provides that they must be involved in construction projects and plans. For the identification of potentials, 
different assumptions resulting from research with regard to advantages and disadvantages for the use of VR were 
analyzed and summarized. These were formulated in various forms into neutral closed and open questions as much 
as possible. 


3.1 Interview 


In this research, the integration of interviews and surveys offers a broader and more comprehensive approach to 
gather data. Whereas interviews have a minor role in this study, they are instrumental in obtaining qualitative data 
and gaining deeper insights into individual experiences, which subsequently informed the design of the survey 
questionnaire (Rafidah Biniti Ab, Rahman, 2023). By combining these research methods, the results can be 
validated and triangulated, enhancing the overall reliability and credibility of the data and the interpretations made. 


Table 1: Overview of the interviewees. 


Year Date Leading Gender Country of origin Field of expertise 
questions 
Interview 1 2020 7" July 16 female Germany PhD and urban researcher in the context 


of the digitalization of cities 


Interview 2 2020 13" July 17 female Austria Focus on urban development and public- 


private partnerships in digital context 


Interview 3 2020 16" July 18 male Switzerland Software developer for mixed reality 


and cloud computing 


Interview 4 2020 12" August 14 male Germany Honorary professor and member of a 


progressive digitization network 


Interview 5 2020 14" August 12 male Austria Chief marketing officer of an augmented 
and Virtual Reality startup 
Interview 6 2020 2" September 10 female Switzerland Focus on Digital Real Estate and 


member of an innovation team 


At the outset of the study in 2020, interviews were conducted with specific stakeholders to shape the questions for 
the study and gather initial insights into the adoption, application, potentials, and barriers of using Virtual Reality 
in the construction industry. A total of six individuals from different disciplines were selected for expert interviews 
at various professional levels, which are shown in Table 1. Care was taken to include individuals with technical 
understanding, as well as scientific and practical knowledge of working with Virtual Reality in the construction 
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industry. The six interviewees were equally distributed among the fields of software development, academia, and 
urban management and marketing (Tab. 1). All of them had experience with urban-level projects or collaboration 
with authorities responsible for planning. The experts represented both Germany and other European countries. 
The duration of the interviews, guided by specific questions, was approximately 60 minutes each. Starting with 
general topics, the discussions moved on to concrete questions about example projects and the involved actors. 
Due to partial critical comments on the topic and missing consents of personal data, the experts are anonymized. 


3.2 Study 


Quantitative research uses standardized surveys to collect data for a large sample size of individual people, groups 
of people or institutions. The methodology is used to analyze and describe mass phenomena. Online surveys are 
characterized by their monetary advantage, location independence of the respondents and low temporary effort of 
collection and evaluation (Mellinger & Hanson, 2021). No related data is stored in the questionnaires, which 
ensures honest statements and opinions. The questionnaires were initially distributed individually to different 
groups of persons (Planning and architectural offices) and later by a random principle, e.g., with the help of the 
inclusion in the newsletter of the Federal Chamber of Architects of North Rhine-Westphalia. The surveys were 
provided as an online survey and sent via link to the appropriate contacts. Three surveys were conducted over the 
course of three distinct years, with each survey period lasting approximately six weeks. The initial survey was 
carried out in July 2020, followed by a subsequent survey in July 2022, and a third in mid of July 2023 (Tab. 2). 


Identification and categorization of questions through literature review and expert interviews 


Sociodemographic Attitude level Level of action Utilization level Factors of increasing 
factors acceptance 
we E n 
on os Zo B oll 
Identification Awareness Test Access Improvement factors 


Interest Acquisition Utilization 


Expactations Implementation 


Fig. 2: Structure of the first study according to the acceptance measurement and for the evaluation of trends. 


The first study from 2020 was a large-scale study, a deliberate combination of closed (yes/no) questions and open- 
ended questions was used to capture the situation of Virtual Reality in the construction industry as accurately as 
possible from different perspectives (Fig. 2). For the first study in 2020, different municipalities and smart cities 
were drawn from a sample of all German cities. Based on the cities identified, planning offices were selected in 
each case. Due to the significance of public participation (§ 3 Abs. 1 BauGB) and the benefits identified in the 
literature, the survey was conducted among both the identified stakeholders in the construction industry and a 
small sample of the population in Germany. In total, the first study thus consisted of four individual surveys. The 
sample size was chosen equally for the first three categories (Tab. 3). One follow-up reminder email was sent after 
half of the survey period. 


Table 2: Overview of the studies. 


Year Questions Question type Participants Country oforigin Estimated duration Objective 

Study 1 2020 16 Mixed 60 +55 Austria, Germany, 12min Large-scale data 
Switzerland acquisition 

Study 2 2022 19 Mixed 53 Germany 10min Data acquisition 


for comparison 


Study 3 2023 9 Closed 11 Germany 3min Data acquisition 
for trends 
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Although the second study was conducted for the use of BIM and Variant Management, it includes a thematic 
block on VR. Therefore, the second follow-up study aimed to capture potential changes and trends in the adoption 
and acceptance of VR in the field. The last study was designed to collect final determining factors and is still 
ongoing. Only single choice questions were used in the third survey to compare most significant factors for the 
success or failure of VR integration and adoption with the previous ones. Actually, the survey period of the last 
one has not yet ended. The number of questions varied depending on the study. Overall, the studies were designed 
with a range of 6 to 16 questions. For the last one, a lower number of questions was used to ensure quick completion 
by the participants. However, the structure always followed a systematic framework, starting with the general use 
of VR for projects, followed by exploring advantages and disadvantages, as well as other criteria for measuring 
the potentials and acceptance. The questions were selected according to the operationalization of acceptance and 
divided into corresponding categories. In this regard, besides dichotomous and multiple-choice questions, Likert- 
scale survey questions and question types such as ranking and matrix questions were predominantly employed. A 
total of (60 (+ 58 participants from the general Population) + 53 + 11) participants were involved in these multi- 
year’s studies, whereby it is to be mentioned that not all respondents answered each question. An overview about 
these studies is shown in Table 2. 


3.3 Experimental realization of the Variant Management in VR 


The study was supported with images to enhance participants' understanding of the use of Virtual Reality in the 
construction industry (Fig. 3). Towards this intent, a building was created which also served as an evaluation 
example to explore any potential opportunities and barriers in the realization of a project, focusing on the 
adaptation of a building design in VR. The building information model was created in Autodesk Revit and exported 
to Unity for visualization in a virtual environment (Fig. 3a). Utilizing the Unity Reflect Review enables design 
validation before and during a project. For this purpose, various options for building elements were modeled, such 
as different window variations, doors, and facade options (Fig. 3b & 3c), allowing architects to experience the 
changes in the building from both the interior and exterior perspectives. This approach aligns with the practical 
work of architects who evaluate different design variants for a building in early design phases and experience 
modifications in 3D to gain a better understanding of the effects of different building elements (Fig. 3d). 


(b) 


(d) 


Fig. 3: (a) Building information model, (b) Building design variant in Unity, (c) Checking for stored design 
options with information and live interaction (Oculus Quest 2), (d) Customization and comparison. 


4. FINDINGS 


The initial study aimed to explore the current state, potentials, and acceptance of Virtual Reality (VR) in various 
sectors of the construction industry. To achieve a comprehensive understanding, the study included not only 
planning offices but also municipalities, smart cities, and segments of the general population in Germany. The 
subsequent two follow-up studies, however, narrowed their focus to examine planning offices primarily. This 
decision was driven by the fact that planning offices play a central role in the planning phases, whereas 
municipalities, smart cities, and the population are more closely associated with the final outcomes of completed 
projects. The results are categorized in sub-sections to facilitate comparisons of the studies. A total of 182 responses 
from the studies can be evaluated for this purpose. The three acceptance levels are considered in the discussion. 


For the findings, the abstentions were mostly omitted from graphical representation to maintain clarity. The 
evaluation of the study is divided into three parts. Beginning with an assessment of VR's awareness and integration 
in the construction industry, it is followed by an exploration of its potentials and barriers. Findings regarding the 
creation of the VR example project, including various design options and the implementation are shown afterwards. 
In the discussion, the level of acceptance of VR technology in the construction industry is presented in the context 
of the demonstrated measurement process. 


The response rates for the first study, illustrated in Table 3, range from 28% to 48%. Additionally, 58 individuals 
from the population participated. Incomplete responses to questions were not included in the response rate and 
were also excluded from the results. For the subsequent studies, voluntary participation was encouraged among 
different offices, because the response rate was the lowest in 2020. Based on a large-scale study examining 8,672 
surveys and 1,071 online surveys in educational research, a weighted average response rate of 44.1% for online 
surveys in educational fields was identified (Wu et al., 2022). Different online survey platforms indicate an 
acceptable response rate between 5% to 30% and values above this correspond to a very good response rate (Chung, 
2022; Le Masson, 2023). In the study from 2022, there were several abstentions on some questions, which resulted 
in the lowest rate for answering a specific question at 41 out of 53 participants (77.3%). All questions for the year 
2023 were completed. 


Table 3: Response rate of the first study 2020. 


Sample Incomplete responses Complete responses Average time needed Response rate 


Municipalities 50 23 19 approx. 6min 38% 
Smart Cities 50 37 24 approx. 7min 48% 
Planning and architecture offices 50 16 14 approx. 6min 28% 
Population - - 58 approx. 5min - 


4.1 Actual awareness and integration 


The study reveals that both awareness of Virtual Reality for the use in the construction industry and the resulting 
potential work experiences vary among the surveyed groups. It is evident that the majority of direct planning 
stakeholders (Municipalities, Smart Cities, and planning and architectural offices) are aware of the application of 
VR in this context, while the general population is more divided. The division among the public was already 
suspected during the interviews, as one expert from the scientific perspective believes that citizen participation 
regarding the acceptance of the technology is currently most advanced, while the software expert is of the opinion 
that the public is still unaware of it all. However, all interviews indicated that awareness among the population is 
increasing with a growing number of VR projects, and through the gamification factor, the usage and involvement 
of the population in projects can be enhanced. Some pilot projects from practitioners can confirm this impression. 


Concerning Smart Cities and municipalities, the experts assumed that they are already aware of the technology, 
but the adaptation processes are challenging. The interviewees with practical background are already familiar with 
cities that work on projects in VR, but they point out that Smart Cities are likely to take the lead, while other 
municipalities may wait and observe these developments. However, it is said, that there is a general interest in the 
topic. It is also noted that larger cities might have a higher interest than smaller towns. Out of 24 respondents, 20 
(83.3%) demonstrated the highest awareness of VR technology and its application in favor of Smart Cities, whereas 
10 out of 17 municipalities (58.8%) showed awareness of the technology (Fig. 4). Two municipalities did not reply. 
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Regarding the planning offices, a relatively low penetration of awareness and the resulting integration and 
application was predicted by the interviewees before the study. First movers would take the lead here, but the 
technological novelty would not initially revolutionize conventional planning processes. Offices that deal with 
renderings at a high level are said to experience increased interest, as it enables them to maintain or even improve 
quality standards. Among the surveyed offices, 9 out of 14 (64.3%) had heard of VR (Fig. 4), and 5 out of 9 
(55.5%) had already worked with it (Fig. 5), indicating a high level of awareness and practical experience. 
Regarding to the work experience, among Smart Cities and municipalities, the percentages were 50% and 30% 
(Fig. 5). In relation to their previous awareness levels, with municipalities having the lowest percentage of work 
experience. 


The study from the year 2022 depicted that the utilization of VR for the actors within architectural and planning 
offices is nearly 10% higher than in 2020. According to the 2023 study, this positive trend is not continuing, as 
54.5% of these offices report that they have already integrated VR into their everyday work (Fig. 7). This indicates 
that the level has adjusted to the 2020 level. However, the willingness to use it (71,4% in 2020) continues in a 
moderate form in 2023 (54,6%). 
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Fig. 4 (left): Comparison of the awareness (2020). 
Fig. 5 (right): Comparison of the practical experience (2020). 


Interest in the utilization was captured by questions regarding the willingness to incorporate Virtual Reality into 
the daily work routine and is shown in Figure 7. With the aid of an example integrated into the questionnaire, even 
participants who were previously unfamiliar with its application could partake in the question. As a result, it 
becomes apparent that 31.6% of the surveyed municipalities are ready to use VR in the near future, and 63.2% are 
neutral and do not reject the possibility of potential future adoption. Among Smart Cities, 50% express a 
willingness to use VR, with no direct rejections. As for the planning firms, 42.9% of the respondents have an 
interest in its utilization only 2 out of 14 offices do not want to use VR in future. In the year 2022, a highly positive 
trend was observed regarding the financial benefits derived from the implementation of technology among these 
actors (Fig. 7). One disadvantage, however, is the readiness of the hardware, which negatively affects the potential 
utilization. This primarily concerns the urban stakeholders, as five municipalities (26.3%) and six Smart Cities 
(25%) are not willing to provide hardware for meetings with other stakeholders, while one office (7%) is unwilling 
to do so (Fig. 7). The experts also perceive the provision of hardware, especially for small offices and small cities, 
as challenging. Experts in technology hinted in 2020 that with the introduction of Apple's proprietary software, 
both the general population and the cities and planning offices would benefit since these devices would reach a 
certain level of maturity and appeal to a broad range of citizens. 
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Fig. 6 (left): Comparison of the potential utilization (2020). 
Fig. 7 (right): Comparison of the utilization propensity and financially worthwhile for offices (2020-2023). 


As the potential utilization is related to the expectations of the technology, these expectations were also collected 
accordingly. For this purpose, three categories were identified, and the respondents were asked to indicate the 
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extent to which they agree with each statement. Despite some negative remarks, the expectations regarding VR 
are predominantly aimed at achieving more efficient workflows with other stakeholders, improved communication, 
and increased public participation. The 2023 results confirm that, but reveal that respondents overwhelmingly 
agree that the use of VR in the construction industry is neither in high nor low demand. However, most respondents 
(72,3%) agree that VR representations have a high impact on the construction industry. 


4.2 Identified potentials and barriers 


The assessed potentials arising from the utilization of VR in the AEC industry are contrasted with the barriers 
encountered based on diverse experiences, offering a comprehensive comparison. Various response options 
identified in the literature and the interviews were given to capture the potentials and barriers for integrating Virtual 
Reality in the construction industry, with multiple responses allowed. The results primarily stem from the extensive 
study conducted in the year 2020. Trends are being verified with the two additional studies. 
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Fig. 8: Results of the first study on the potentials of the use of VR in the construction industry. 
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Fig. 9: Results of the first study on the barriers of the use of VR in the construction industry. 


The most significant advantage, cited by 71.4% of planning offices, is the ability to create appealing simulations, 
followed by nine offices (64.3%) that agree to the advantage in better collaboration with other planning 
stakeholders (Fig. 8). Error minimization gained the greatest advantage among respondents in 2023 (36,4%). As 
the most significant disadvantage (71.4%), these stakeholders mention the financial burden. The two additional 
studies also confirm, with the highest number of agreements, the potentials of VR for simulation and collaboration, 
identified as the greatest potential for planning offices and their respective stakeholders. Additional monetary 
expense remains the most important barrier factor across the board, even within the study group (Fig. 9). 


The three most significant potentials of the municipalities are transparent presentation of information (24.1%), 
improved collaboration with the population (17.2%), and the creation of appealing simulations (15.5%) (Fig. 8). 
The most significant barriers are the financial burden (22.7%), followed by the potential need for hiring new 
personnel, longer adaptation time, and the requirement for high computing power, each cited by 15.2% (Fig. 9). 


For Smart Cities, the potentials are in the same three categories but with different weights (Fig. 8). The potential 
areas are transparent presentation of information (17.6%), improved collaboration with the population (19.4%), 


and engaging simulations (20.4%). As barriers for the integration of VR in the construction industry, the major 
concerns are the financial expense (37.3%), the need for hiring new personnel (27.1%), and challenges in operating 
the technologies as well as a low demand (13.6%) (Fig. 9). 


In the study, the indicated potentials, based on the cumulative votes, outweigh the cumulative number of votes for 
the barriers. 


4.3 Practical implementation 


During the implementation of the Variant Management for visualization in VR, minor obstacles initially emerged, 
as saved building options cannot be stored directly in the software environment. Instead, only the selected variant 
is saved as a Revit file. In consequence, two versions of the designed building had to be saved for the visualization, 
each with different variants selected. Finally, the visualization is performed with Unity Reflect Review. 


Another complication occurred attempting to use the Revit files directly in Unity Reflect Review, as the Unity 
Engine does not support the Revit format (.rvt). Alternatively, saving the file in the format (.fbx) facilitated the 
export and import of 3D objects, 2D objects, light sources, cameras and materials between Autodesk software 
programs and, since 2017, also between Unity. However, the object is displayed generically and in grey because 
Unity does not recognize the materials from Revit. Manual input of new materials that are recognized by the Unity 
library does not include all Revit materials and proves to be too time-consuming. Solving the problem is to 
alternatively use the IFC file for the two models and import it into Unity. A script then allows the BIM metadata 
to be retrieved and displayed within the object by having the script read the elements' information from the IFC 
file. It is important that the IFC file comes from the same 3D model referenced by the FBX file. 


5. DISCUSSION 


The studies have demonstrated that among the examined direct planning stakeholders, there is a predominant 
awareness of VR for planning processes. Many of these respondents also exhibit an interest in usage, some of 
which has even slightly increased. They highlight significant expectations of VR for planning tasks, aligning with 
the identified current capabilities. The results of the first study indicate that several municipalities, half of the 
surveyed Smart Cities, and the majority of the surveyed planning offices have already worked with and tested VR. 
While the willingness to acquire suitable hardware was generally portrayed as positive in all studies, the highest 
barrier, particularly for planning offices, was the financial extra expenditure. In terms of implementation progress 
in daily operations, a positive trend can be observed from 2020 to 2022. However, the results from 2023 potentially 
suggest a plateauing. It should be noted, however, that the study does not reflect the total number of offices, so the 
results from 2023 should rather be understood as a sample for trends. The provision of hardware is accompanied 
by the already examined acquisition. The willingness to use VR is strongly pronounced among the examined 
stakeholders, and the potential provision of hardware for meetings with other stakeholders is partially available. 
Ultimately, the utilization rate over the years demonstrates a significant practical application of VR, even though 
it appears to be slightly declining in 2023. This trend could potentially be attributed to a lack of demand for VR 
implementation in projects from contractors (cities and investors), for instance. Regarding the application, some 
experts see the advantage in the mobile use of VR, as it lowers the entry barrier and in the developments of the 
products and integration in a broad market. Overall, it can be observed that there was a solid basis for the 
acceptance of VR in the construction industry for planning decisions, which has continued to grow over the years. 
The majority of the sampled planning and architectural offices exhibit attitudinal (Level 1) and application (Level 
2) behavioral acceptance. However, because utilization (Level 3) depends on certain factors, such as personnel, 
acquisition, strategic planning and access the acceptance of usability in 2020, is present in less than half of the 
offices. However, the subsequent studies have shown that the utilization of VR has not declined, which has led to 
the fact that the third level of acceptance has now also been reached by the majority. Among the smart cities, there 
was already an overall acceptance in 2020, according to the results of the majority of respondents, whereas the 
municipalities were not yet able to achieve an overall acceptance at that time, due to a lack of awareness, the 
resulting lack of test phases and a low willingness to invest. This results in poor values for this actor at all three 
levels of the acceptance measurement. 


While the mentioned potentials and barriers indeed prevail, an individual weighting of factors is not feasible. For 
instance, a barrier might be so formidable that it cannot be overcome, as exemplified by the purchase of software 
for small design firms or cities. Nevertheless, the results of the assessment of monetary readiness for the adoption 
of VR software in comparison to its benefits have demonstrated that numerous architects and engineers have been 
able to enhance the integration and implementation of VR. By surveying the barriers among the different 
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stakeholders, it was possible to identify factors for increasing the acceptance of VR for the visualization of 
planning in early design phases, which can be addressed accordingly from the developer perspective as well as the 
user perspective in further research. 


The provided example has demonstrated that visualizing digital buildings in Unity using the IFC format of a model 
is straightforward, while the exchange of design alternatives with the Variant Management is a bit difficult. In the 
future, an alternative approach could be explored to better integrate the stored options in this regard. Visualizing 
projects in the early design phases provides added value for stakeholders, and there is a recognizable public interest 
in VR visualization of the final product for public information activities and cooperation with other stakeholders. 
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ABSTRACT: Digitalization in the construction industry is increasingly striving to create digital twins in order to 
continuously exploit optimization potential in the management and utilization of existing buildings. Building 
Information Modeling (BIM)-based as-is or as-built documentation represents a promising basis in this context, 
which requires creating a geometric model for example based on point clouds as well as semantic enrichment in 
a Scan-to-BIM workflow. Conventionally, this is carried out manually by specialists on 2D screens and often is 
time-consuming and costly. The project "Building Inspector XR" addresses these issues and presents an intuitive 
solution for BIM-based as-is/as-built documentation using X-Reality (XR). In Virtual Reality (VR), BIM models 
are created off-site from point clouds and then are verified in Mixed Reality (MR) on-site. By integrating (partially) 
automated methods and targeting user-friendliness in our solution, Scan-to-BIM can be realized more efficiently 
and intuitively. In this paper, the focus lies on the innovative aspects of our XR application which encompass VR 
and MR environments, automation support, modeling schemes in compliance with BIM standards, and the 
registration of models in reality for MR. Additionally, the paper shows the interconnected toolchain that facilitates 
an efficient Scan-to-BIM workflow. 


KEYWORDS: Building Information Modeling, Virtual Reality, Mixed Reality 


1. INTRODUCTION 


The global and cross-industry trend of digitalization is increasingly shaping the construction industry as well. As 
a comparatively less digitalized industry, there is a great need for the development and implementation of digital 
methods in the construction industry in particular. The linchpin of the digital transformation in the construction 
industry is the cooperative working methodology Building Information Modeling (BIM). Compared to traditional 
methods, BIM offers the potential to improve communication and coordination between all construction 
stakeholders, make management for cost and time more efficient, and achieve higher levels of detail in digital 
building models by incorporating component-specific semantics in addition to geometry. Although the use of BIM 
is applicable across the entire lifecycle of buildings, in reality the collaborative working methodology is 
predominantly used for new structures. For existing buildings, BIM has not yet been applied frequently, although 
it could create added value for operation and utilization by enabling the resulting digital as-is/as-built models to 
be used to represent and evaluate the existing condition of the building and, based on this, to plan possible 
maintenance measures, carry out simulations or organize issue management more efficiently. 


For the application of BIM in this context, it is first necessary to capture the existing buildings. In most cases, such 
an acquisition process is carried out by means of laser scanning (terrestrial or mobile) or photogrammetry. The 
result of these procedures is a 3D point cloud. The subsequently necessary process of generating a digital building 
model from the available point cloud is summarized under the term "Scan-to-BIM" and is currently mainly 
implemented manually on two-dimensional screens using keyboards and mice. Therefore, we identified the need 
to improve the Scan-to-BIM process, by making it more intuitive in a 3D space using X-Reality (XR). In this paper, 
we present the project “Building Inspector XR”, which includes both a Virtual Reality (VR) and Mixed Reality 
(MR) application, for the intuitive creation of as-is/as-built BIM models based on existing point clouds. 


2. BACKGROUND 
2.1 Reality Capturing 


As a holistic approach to documenting structures, BIM places more far-reaching demands on building surveying. 
To meet these requirements, various methods of reality capturing are employed, including photogrammetry, 
terrestrial laser scanning and mobile laser scanning. 
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Photogrammetry is a method of deriving geometric information of an object from images (Luhmann, Robson, 
Kyle, & Boehm, 2014). A distinction is made between mono- and stereo-photogrammetry. While mono- 
photogrammetry is based on the analysis of single images, stereo-photogrammetry uses pairs of images to measure 
objects, such as buildings. Using photogrammetry, areas that are difficult to access can be surveyed with a 
comparatively high information density with Structure from Motion/Dense Image Matching. The respective 
achievable geometric accuracy is subject to a number of factors, such as the camera technology, the resolution of 
the individual images, the type of reference point determination (Donath, 2008) and specifically the image scale. 


In addition to photogrammetry, laser scanning forms another practical method for building acquisition and is 
considered the leading technology for the acquisition of 3D spatial information with high density (Lari, Habib, & 
Kwak, 2011). From a functional point of view, a distinction is drawn between laser scanners with impulse and 
phase measurement methods. Impulse measurement determines the time of flight of the laser beam, while phase 
measurement evaluates the phase shift of the reflected signal. While laser scanners with the phase comparison 
method are characterized by a significantly higher number of recorded points per second, laser scanners with the 
impulse measurement method offer a significantly higher range. Terrestrial laser scanning is characterized by 
stationary acquisition of local 3D data of the object from different locations and subsequent registration of the data 
of all locations relative to each other. In contrast to this, mobile laser scanning continuously acquires 3D data while 
the object is in motion and registers the data via positional information. 


The output of each of these methods is a high variety of 3D points representing the respective surface of the 
acquired object. These 3D point clouds can be used for further analysis and modeling. 


2.2 Scan-to-BIM 


In addition to the surveying workflow, the holistic BIM approach to the acquisition, management and exchange of 
building information also has implications for the processing and modeling of data (Blankenbach, Schwermann, 
& Becker, 2021). The process of creating or reconstructing as-is/as-built BIM from 3D point cloud data is called 
Scan-to-BIM. For this process, individual geometric elements such as walls, doors or windows are initially created 
on the basis of the point cloud data. Additionally, to the geometric model, attributes such as materials, dimensions 
or further semantic information are linked and documented to the corresponding objects. Subsequently, the 
resulting BIM model is validated and checked for possible errors or inaccuracies to ensure compliance with 
requirements and standards. Currently, this process is predominantly handled manually by professionals using 
(3D) computer-aided design (CAD) or BIM authoring software, such as Autodesk Revit (Autodesk, 2023). While 
automated solutions for this process are being progressively developed in both research and industry, each of the 
currently available off-the-shelf Scan-to-BIM software packages presently still requires significant manual user 
input making the entire process cumbersome and error prone (Son, Kim, & Turkan, 2015). 


2.3 State of the art 


(Adekunle, Aigbavboa, & Ejohwomu, 2022) compiled a systematic literature review network analysis on the topic 
of Scan-to-BIM in 2022. According to this, Scan-to-BIM is being researched worldwide with a view to finding 
more efficient solutions, which occasionally also include the use of VR and AR technologies for information 
management. However, specific works are not mentioned since the paper focusses to summarize, categorize and 
analyze the different topological backgrounds and their nationally backgrounds. Therefore, no specific work for 
modeling in VR or Augmented Reality (AR)/MR is addressed. (Wu, Hou, & Zhang, 2021) and (Alizadehsalehi, 
Hadavi, & Huang, 2020) show studies that aim to compile various publications related to BIM and XR applications. 
(Wu et al., 2021) outlines a BIM-XR application as a system combining a BIM database and a human-machine 
interactive interface for context-aware visualization and interaction. Generally, it highlights that XR can enable 
modifications and updates to the BIM model and offers intuitive visual representations and interactive experiences 
within the BIM context. However, the study primarily emphasizes the visualization aspect and does not present 
detailed approaches for geometric and semantic BIM modeling in VR or MR environments. Specifically, the focus 
is on AR/MR for post-construction BIM model adjustments, while VR is considered mainly for visualization 
purposes. (Alizadehsalehi et al., 2020) discuss the benefits of XR technologies for construction project simulation 
and present a comprehensive overview of XR applications. Nevertheless, while acknowledging the potential of 
using XR with BIM for interactive visualization, the study focuses on approaches to transform pre-designed BIM 
models into XR rather than intuitive modeling in VR and MR. 


Commercial products such as Enscape (Chaos, 2023) or Twinmotion (Epic Games, 2023a) allow a visualization 
of 3D models in VR as well as creating images, animations or walkthroughs, however, both require importing 
previously created models and do not provide the possibility of modeling, except for simple drag and drop 
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functions via the desktop computer editor, in VR. Further XR possibilities, such as the transformation into AR or 
MR, are also not possible. In comparison to our solution, VR Sketch (VR Sketch, 2023) presents a similar approach. 
Nevertheless, it lacks comprehensive BIM capabilities, making it incapable of conforming to BIM standards when 
modeling components. Moreover, the software does not support point clouds. Arkio (Arkio, 2023) offers a platform 
specifically developed for collaborative design and architectural conception, focusing on free geometric modeling 
ina VR environment. For this, existing BIM models and 2D plans can be imported as a data basis and the created 
3D geometric models can be exported into selected BIM authoring software, such as Autodesk Revit or BIM360. 
However, Arkio does not provide BIM-specific functionalities to ensure BIM-compliant (as-built) modeling, like 
complying with existing standards, providing standardized component catalogs or semantic enrichment. Also, 
Arkio is limited to proprietary file formats and does not contain the option of exporting to the open Industry 
Foundation Classes (IFC) (buildingSMART, 2019) standard. 3D models can also be created and edited in AR, but 
this feature is limited to tabletops. GAMMA AR (GAMMA Technologies S.à r.l, 2023) on the other hand offers 
the possibility to overlay BIM models on construction sites. This integration aids in error prevention and precise 
component monitoring. Nonetheless, it does not support actual modeling activities. BIM Holoview (BIM 
Holoview Ltd., 2023) combines VR and AR/MR to enable users to view BIM models using Meta Quest 2 (Meta, 
2023) and Microsoft HoloLens (Microsoft, 2023). However, it is limited to Autodesk 3D Revit and Navisworks 
files, restricting modeling functionalities. Unity Reflect (Unity Technologies, 2023) encompasses the broadest 
range of capabilities. It allows users to view BIM models in various ways, including VR, AR/ MR, on multiple 
devices, nevertheless, it does not support the creation of BIM models in real-time. 


3. METHODOLOGY 


The Building Inspector XR revolutionizes the process of building inspection by harnessing the power of VR and 
MR technologies. This chapter presents an overview of the process chain involved in the Building Inspector XR, 
covering its architecture, hardware and software choices, and the functionalities it offers. Through the utilization 
of point clouds and BIM models, accurate pose tracking, and seamless integration of virtual content into the 
physical environment, the Building Inspector XR system streamlines the Scan-to-BIM process. 


3.1 X-Reality for Scan-to-BIM 


The XR BIM system is the foundation of the Building Inspector XR. Fig. 1 illustrates the system's structure, where 
a point cloud, generated through photogrammetry or laser scanning, serves as the initial data in VR. With the point 
cloud as a context, the user creates a BIM model, which can be exported as an IFC file for interoperability or 
brought into MR for on-site enhancement. This seamless interchangeability between VR and MR enables efficient 
workflows, supporting models based on the IFC building and waterways domain. Choosing the right hardware and 
software is crucial for the success of an XR system. For the Building Inspector XR system, the Valve Index (Valve 
Corporation, 2023) and the Microsoft HoloLens 2 were selected. The Valve Index, a tethered VR headset, provides 
high-quality pose tracking necessary for accurate movements in VR. On the other hand, the Microsoft HoloLens 
2 was chosen for its MR capabilities, including accurate pose tracking, gesture-based interactions, and immersive 
visuals. Software plays a significant role in the Building Inspector XR system, with Unreal Engine (UE) (Epic 
Games, 2023d) serving as the development framework of choice. UE offers a broad application field beyond 
gaming, supporting various file formats and a wide range of hardware, including the Valve Index and the Microsoft 
HoloLens 2. The system leverages relevant plugins to extend its functionalities, such as the Datasmith Plugin (Epic 
Games, 2023c) for importing IFC files and the LiDAR Point Cloud Plugin (Epic Games, 2023b) for importing 
point clouds. The open source nature of UE allows for customization and modifications to the core engine 
functionalities. 
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Fig. 1: Architecture of the Building Inspector XR. 


The LiDAR Point Cloud Plugin on the one side offers an import interface and on the other side point cloud 
rendering methods to visualize the point clouds in UE. For starting the modeling process in XR, the initial point 
cloud can be provided in a couple of different formats, such as XYZ, PTS, LAS/LAZ or E57. Once available in 
the system, multiple parameters of the LIDAR Point Cloud Plugin ensure that the point clouds are rendering 
efficiently. We configured the plugin so that the point clouds are visualized with the highest quality possible, while 
maintaining a high performance. A highly detailed presentation of point clouds is crucial, so that as much 
information can be derived from the data as possible during modeling. Like this, small details of the environment 
can be already identified and modeled from the point clouds, enhancing the value of the BIM model, and 
minimizing additional modeling work later in MR or post processing (Fig. 2). 
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Fig. 2: Point cloud in VR. 


In XR, but especially in VR, rendering performance is crucial for a smooth and realistic user experience. A high 
frame rate of 90 frames per second (FPS) or higher is recommended to reduce motion sickness and enhance the 
overall user experience. To address the challenge of rendering large amounts of data, optimization techniques are 
employed. These include an efficient octree data structure and spatial partitioning. Level of Detail (LOD) systems 
are used to reduce complexity as objects move farther from the camera, improving performance. The VR system 
provides intuitive interactions for easy BIM model creation. The VR controllers enable users to access the menu 
attached to the left controller and interact within it. Over the menu all required functionality to create a complete 
BIM model is accessible to the user, including geometry and semantic data modeling methods, editing tools and 
export functionality. The right controller is reserved for the actual modeling process. Teleportation allows instant 
movement to designated spots in the virtual environment, while fly-mode enables reaching higher locations. Pose 
tracking allows physical movement within smaller areas, accurately replicated in the virtual environment. 
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In contrast to VR, the MR system of the Building Inspector XR does not require a complete virtual world (Fig. 3). 
Instead, it integrates the virtually modeled objects into the physical environment. This is specifically challenging, 
because the virtual elements modeled in VR need to be placed as accurately as possible in reality, so that they 
overlay their physical counterparts. Like this, the user can inspect elements, for instance by comparing the modeled 
with the real situation. Also, details can be added to the BIM elements, for example additional attributes such as 
material or even further geometric information. This is important, if objects or details were not visible in the point 
cloud, either because they were covered by other objects during reality capturing or too small so that they are 
hardly or not visible in the point cloud. 


Fig. 3: MR application. 


Accurate alignment between the virtual and physical worlds is achieved through pose tracking, which seamlessly 
integrates the virtual and real elements. The system employs a Visual Simultaneous Localization and Mapping (V- 
SLAM) algorithm on the Microsoft HoloLens 2 for accurate pose tracking. This algorithm keeps track of the user’s 
pose (position and orientation), thus, of the Microsoft HoloLens 2, keeping virtual objects robustly anchored to a 
location in reality. To achieve an initial accurate alignment between the virtual objects and reality, a registration 
method is employed. The Building Inspector XR system utilizes the Kabsch-Umeyama algorithm (Umeyama, 
1991), a widely used method for aligning and comparing similarity between two sets of points in multidimensional 
space. The goal of the algorithm is to find the optimal rigid transformation matrix that aligns one set of points to 
another set of points. It can roughly be broken down into three steps: 


1. Calculate the centroids of both point sets. 
2. Transform both datasets to the origin and then calculate the required rotation. 
3. Calculate the required translation and scale. 


For MR registration, this can be used by defining corresponding points in the virtual world and reality. Optimally, 
points are chosen that can easily be identified in both spaces, for instance the comers of a door or window (Blut 
& Blankenbach, 2021). We therefore provide the means for users to actively select corresponding points in the 
BIM model and in reality. The gesture-interaction system of the Microsoft HoloLens 2 allows easy and intuitive 
point selection using fingers. And with the spatial understanding of the Microsoft HoloLens 2, points can 
accurately be placed in reality. Since the points need to correspond to each other, we provide numbered spheres 
that simply must be placed in the desired spots by drag-and-drop interaction (Fig. 3). Once all spheres have been 
placed, the user only needs to confirm to perform the instant alignment. This referencing process can be repeated 
as needed to maintain accuracy. 


Once virtuality and reality have been aligned, the user can start the inspection or modeling process. Due to the 
head-mounted nature of the Microsoft HoloLens 2, the user has both hands free for interactions. We provide a 
floating menu with the same UI and functionalities as in VR, so that creating a BIM model is as intuitive in both 
spaces but optimized for the technology. Creating and interacting with objects using Microsoft HoloLens 2 is as 
easy as using the point or pinch-gesture. 
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3.2 BIM-compliant as-is/as-built modeling 


By combining VR and MR, the Building Inspector XR enables the creation of models completely in 3D space. 
Furthermore, modeling as well as interactions are carried out intuitively by gestures and voice input. In terms of 
functionality, we placed emphasis on advanced geometric modeling, linking semantic object data, and compliance 
with standards for ensuring a standardized BIM model (as-is/as-built model). Within the scope of the associated 
research project, we focused on the as-is/as-built BIM modeling of building as well as water engineering structures. 


The Building Inspector XR offers users three different approaches to geometric modeling, tailored to the 
complexity of physical objects. The first method is free modeling (Blut et al., 2023). Simpler objects like 
rectangular walls, floors, or ceilings can be created by specifying just two diagonally opposite points and the 
software automatically generates a solid object with parallel edges. This method ensures quick and precise 
modeling. For objects with more intricate geometry, the parametric modeling feature is available. Users input a 
minimal set of parameters, from which all the necessary geometric information is derived. Step-by-step 
instructions guide the users through the process. For example, when modeling a ladder, the user only needs to input 
the base point, height, and width, with adherence to relevant standards such as the German DIN 18799 for fixed 
ladder systems on structures. The DIN 18799 specifies that the width of a ladder may be between 400 and 600 mm 
and that the distance between the ladder rungs must be between 225 and 300 mm. Furthermore, it defines that the 
first and last rungs should be at least 100 mm and a maximum of 400 mm from their respective ends of the ladder. 
When the user inputs the required parameters, these value ranges are considered. If any value exceeds or is below 
the permitted ranges, the respective upper or lower limits are set as new modeling parameters to ensure that the 
modeled object is in accordance with the prevailing standards. Based on the resulting dimensions of the ladder, 
the two side rails are first created, and then the number and positions of the individual rungs are automatically 
generated, considering a standard-compliant entry and exit spacing of 200 millimeters. 


In cases where manual modeling proves challenging due to the complexity of the object, the Building Inspector 
XR provides an integrated catalog of pre-modeled components. These components come with essential attributes 
and can be easily dragged and dropped into the desired positions. This feature significantly speeds up the modeling 
process, particularly for highly complex objects. For example, a niche bollard can be quickly inserted using this 
method (Fig. 4). To enhance modeling capabilities further, the Building Inspector XR enables users to selectively 
cut out areas defined by the user using Boolean operations. This allows for the insertion of additional components 
or modifications to existing objects. 


Fig. 4: Placing pre-modeled IFC components in the scene. 
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A plane fitting algorithm assists users in accurately placing objects on a plane, ensuring proper alignment between 
components and reducing modeling time. For the detection of surfaces the Random Sample Consensus (RANSAC) 
algorithm is used. The goal is to find, in several iterations, the area where the largest number of points from the 
point cloud lie in a plane, i.e., where the distances of all points to the respective plane are minimal, by randomly 
selecting three points to form a plane in each iteration. Then, the distances of the remaining points to the plane are 
determined. The plane of the best iteration is stored. RANSAC ensures that outliers are effectively eliminated (Fig. 
5). 


Fig. 5: Detected planes using the plane detection algorithm. 


The Building Inspector XR not only focuses on geometry but also emphasizes the assignment of semantic data to 
objects. By adopting the IFCx4.3 (buildingSMART, 2023) standard as its data model, the software ensures 
compliance with BIM standards throughout the modeling process. Users can input specific attributes such as name 
and description for each object, and additional information can be freely assigned using IFC’s property sets. The 
software follows a clear hierarchical structure, enabling users to classify objects, create object-specific properties, 
and assign components to the IFC hierarchy. Adhering to the IFC hierarchy is crucial for effective BIM 
implementation. Furthermore, it ensures that modeled objects are correctly classified and organized within the 
BIM model, facilitating efficient data exchange and collaboration among project stakeholders. The IFC data model 
is reflected 1:1 already in the application, so that BIM models can easily be exported without loss of data. This 
was solved by creating the corresponding IFC classes in UE and placing them according to the place in the IFC 
hierarchy in the UE scene graph. When the user during the modeling process creates a new object, this object can 
be filled with attributes according to the standard and placed as a parent or child of other objects. 


Moreover, the Building Inspector XR facilitates the export of the completed BIM models in the standardized IFC 
(IFCx4.3 standard) format. The resulting IFC file in STEP Physical Format (IFC-SPF) is readable in any IFC- 
compliant software, allowing for visualization, editing, and extension of the model using BIM viewer or BIM 
authoring software. 


The integration of water engineering structures into the Building Inspector XR further enhances its versatility and 
applicability in various construction projects. Users can now model, analyze, and visualize not only buildings but 
also water engineering elements such as locks, dams, and canals. The inclusion of these new IFC classes 
demonstrates the commitment to providing a comprehensive XR BIM solution that caters to a wide range of 
construction disciplines. 
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4. BUILDING INSPECTOR XR IN PRACTICE 


In this section the practical experience with the Building Inspector XR is described. To evaluate the Building 
Inspector XR, we modeled an office. To obtain the base data for modeling, first, the office and the surroundings of 
the office were captured with the geodetic terrestrial laser scanner Riegl VZ400 and contained 21 million points 
(Fig. 6). Subsequently, the resulting point cloud was cleaned from outliers and unnecessary data and exported as a 
LAS file, since the format proved to work well with the UE LiDAR Point Cloud Plugin. With LAS, importing 
goes quickly and all data is transferred correctly. On a computer with an AMD Ryzen 9 3900X 12-Core Processor 
with 3.80 GHz, 32 GB RAM and a M.2 SSD, the LAS file could be imported in less than a second. Importing the 
same point cloud in E57 format took roughly 20 seconds. 


Er 


Fig. 6: Point cloud from terrestrial laser scanner of an office in the Building Inspector XR. 


A model was created in VR and then later transferred to the Microsoft HoloLens 2. The registration method for 
aligning the virtual world with reality proved to be efficient, so that the previously modeled objects overlayed their 
physical counterparts accurately. Three corresponding points were used. The distribution of these three points 
across the room was crucial, so that an optimal transformation could be calculated. Therefore, we placed on point 
in the corner of the room, one point on the other side of the room on the corner of a window and the third point on 
the corner of the door in the wall between the first two points. This provided the maximum distribution of points 
in the room. The resulting alignment had an accuracy of under | cm. After modeling, the BIM model was exported 
as a IFC-SPF. The created BIM model, i.e., the resulting IFC file then could be loaded without problems in different 
BIM viewers, such as BIMvision (datacomp, 2023). 


The evaluation showed that the Building Inspector XR has a distinct advantage over professional and highly 
complex BIM authoring software. Inexperienced users could effortlessly create IFC-conforming models using this 
system. The immersive nature of VR and MR played a significant role in this achievement as it made handling 
virtual tools and dealing with complex data like point clouds and IFC models much more accessible. Users found 
the interface intuitive and were able to interact with the models in a natural manner, similar to real-world 
interactions. The users' experiences revealed that certain modeling tasks, particularly those involving large objects, 
were most efficiently performed in VR. The flexibility of locomotion in the virtual environment allowed for 
quicker and more fluid modeling. On the other hand, when it came to adding finer details and object-specific 
information, the MR application proved to be more advantageous due to its ability to provide better context for 
these additions. One notable result of the system's efficiency was the ability to model a substantial structure in 
only a short time. This speed and ease of modeling emphasized the system's capability to create BIM models in a 
highly efficient manner. Overall, the combination of VR and MR in the XR BIM system demonstrated its potential 
to empower users, regardless of their experience level, to produce accurate and conforming BIM models in a more 
intuitive and time-effective way. 


5. CONCLUSION 


With the Building Inspector XR, we aim to enhance the Scan-to-BIM process by developing an efficient workflow 
that incorporates VR and MR, ensuring the creation of BIM-compliant as-is/as-built models in a standardized IFC 
structure, which includes the latest state of the art and thus in addition to building construction also water 
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engineering classes. By transferring BIM to VR and MR, integrating automation, implementing modeling schemes 
based on BIM standards, and facilitating model registration for MR, we have successfully developed a more 
intuitive workflow in this context, as users can experience a more immersive and interactive environment for 
modeling and inspecting existing structures, especially using XR technologies. 


For now, we have focused on building and water structure engineering with a selection of elements, therefore, the 
current capabilities of the Building Inspector XR do not cover all structure types and components but demonstrate 
the efficiency of our approach. We believe that the Building Inspector XR has the potential to expand its 
applications in the construction industry. By leveraging artificial intelligence, opportunities to automate modeling 
tasks can be explored, improving efficiency and reducing human error. In addition to progressively expanding the 
functionalities of the Building Inspector XR, the inspection aspect, in particular, holds significant promise for 
further development and extension. To validate and refine the solution, further testing and implementation in real- 
world scenarios is essential. This would provide valuable insights into the effectiveness and practicality of the 
Building Inspector XR in real-world construction projects and enable further optimizations and adjustments based 
on feedback and requirements. 
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ABSTRACT: In the Architecture, Engineering and Construction (AEC) sector, data extracted from building 
information modelling (BIM) can be used to create a digital twin (DT). The algorithms of a BIM-based DT can 
facilitate the retrieval of information, which can then be used to improve building operation and maintenance 
procedures. However, with the increased complexity and automation of the building, maintenance operations are 
likely to become more complex and may require expert intervention. Collaboration and interaction between the 
operator and the expert may be limited as the latter may not be on site or within the company. Recently, extended 
reality (XR) technologies have proven to be effective in improving collaboration during maintenance operations, 
through data display and shared interactions. This paper presents a new collaborative solution using these 
technologies to enhance collaboration during remote maintenance operations. The proposed approach consists of 
a mixed reality (MR) set-up for the operator, a virtual reality (VR) set-up for the remote expert and a shared Digital 
Model of a heat exchanger. The MR set-up is used for tracking and displaying specific information, provided by 
the VR module. A user study was carried out to compare the efficiency of our solution with a standard audio-video 
collaboration. Our approach demonstrated substantial enhancements in collaborative inspection, resulting in a 
significative reduction in both the overall completion time of the inspection and the frequency of errors committed 
by the operators. 


KEYWORDS: Virtual Reality; Mixed Reality; Operation & Maintenance; Collaboration; Digital Twin 


1. INTRODUCTION 


From all the new methodologies and technologies brought by the latest industrial revolution known as Industry 
4.0 (14.0), some of the most explored in the last years are Digital Twins (DT) and eXtended Reality (XR) 
technologies (Augmented Reality (AR), Mixed Reality (MR) and Virtual Reality (VR)) (Gartner Top 10 Strategic 
Technology Trends for 2023, 2023; Jamwal et al., 2021). Numerous studies have already proven that these 
technologies can improve industrial performance, but also building exploitation. A previous work has been done 
to summarize all these improvements, focusing on the ones brought to maintenance procedures in the Architecture, 
Engineering and Construction (AEC) sector (Coupry et al., 2021). It has been shown that data extracted from the 
building information model (BIM) can be used to create BIM-based DT. Such DT can be likened to a centralized 
database where real-time and static data of an equipment can be gathered and retrieved or used to predict the 
equipment behaviour and, thus, to compute the optimal maintenance time. Thanks to the centralization offered by 
a DT, different stakeholders can participate more actively in maintenance procedures, adding equipment-specific 
information or even checking it before maintenance is needed. In this context, XR devices can be used by on-site 
operators to display this information in front of the equipment, giving them access to the data needed to perform 
a maintenance operation. 


However, occasionally, the on-site operator may require more specific assistance in resolving certain issues he or 
she may encounter. The increased complexity of systems and procedures, brought about by 14.0, may necessitate 
contacting a remote expert. Such assistance may also be required in the case of maintenance work on equipment 
with which the operator is unfamiliar. With the impact of Covid-19 and the increasing costs of transport, it is now 
needed to provide new methods for remote collaboration with an expert. XR devices can be used to provide 
meaningful information on both sides of such a collaboration. Either using virtual representations or shared video, 
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the XR devices provide remote experts with contextual information, such as the position and orientation of the on- 
site user in relation to the inspected equipment. These devices also provide advanced methods to display 
information to the on-site user, either through localized annotations or specific data related to the equipment. 


Speech communication is the most common method of information exchange during a remote collaboration. Visual 
cues, such as sharing visual context, are crucial to enhance collaborative performance. Such information can be 
obtained through 3D reconstructions, which can be static (Kolkmeier et al., 2018; Piumsomboon et al., 2017) or 
in real-time using depth sensors (Bai et al., 2020; Gao et al., 2016). To scan the, though, specific cameras are 
typically required. Furthermore, if any changes were made since the last capture, the static models’ reliability 
decreased, which also affected the accuracy of the information shared with the expert. Sharing view has also been 
explored, either by limiting the expert’s perspective to that of the operator (Serubugo, 2018), or by using 360° 
video, which provides real-time information while allowing the expert to move his vision freely, independently 
from the operator’s (Teo et al., 2019). A 360° camera is thus required, which could burden the on-site operator 
unnecessarily. 


Oral exchange and sharing context are not the only elements required for a good collaboration, visual aid is also 
important. While some researchers found that the use of annotations (Anton et al., 2018; Fakourfar et al., 2016) 
can be helpful to share specific location or elements, others observed that sharing gaze (Bai et al., 2020; 
Piumsomboon, Dey, et al., 2019) can help the collaborators to understand where everyone is looking. Sharing 
gestures has also been studied and proved to be useful for specific manipulations, such as assembly tasks or 
localisation issues (Chenechal et al., 2016; Wang et al., 2019). The use of a 3D avatar to show the user’s movements 
and position has also proved to be helpful in increasing performance and decreasing the mental effort of the 
operator (Piumsomboon, Lee, et al., 2019). A method is proposed by (Grandi et al., 2019), allowing asymmetric 
collaboration between two users using handheld both AR or VR devices. Even if users could manipulate a 3D 
model to complete docking tasks, this system allows interactions only with a virtual object, not a physical one. 
Another work by (Ladwig, 2019) allows a VR user to interact with the 3D representation of a suitcase/machine. 
Each action performed by the VR user activates a LED to the physical to inform a local user which action to 
perform. This solution comes closest to using a DT to assist a field operator. (Oda et al., 2015) used so called 
“virtual replicas” to communicate between VR and reality. These replicas are copies of tracked physical machine 
parts that are rendered accordingly at the correct position in the virtual environment in relation to the machine. 
Wang et al. have already shown of these virtual replicas can be used to improve remote collaboration by projecting 
the remote expert gestures to the local operator (Wang et al., 2023). However, projection is not always possible 
due to lightning issues or narrow operating spaces. 


The solution proposed in this work draws its inspiration from all these projects. It consists of a new system where 
a remote expert and a local operator can both use real-time audio-video feedback and 3D models to interact with 
each other. The system is using collaboration techniques from both asynchronous (checking explanations 
beforehand) and synchronous (physical positions of the operator and the system) collaboration systems. A user 
study has been conducted on the impact of this solution during a collaborative remote inspection of a heat 
exchanger. The inspection consists of several manipulations, requiring both one-handed and two-handed 
operations. The rest of this paper is organized as follows. In Section 2, the framework of our solution is presented. 
Section 3, describes the user study performed to validate the usability of the solution, followed in Section 4 by the 
analysis of the results. In Section 5, these results are discussed. Finally, Section 6 presents the conclusions and 
thoughts on the remaining work to be done on the solution. 


2. FRAMEWORK 
2.1 Prototype setup 


The prototype solution design focuses on binding both audio-video exchange and immersive 3D interactions into 
a single cross-platform solution. This solution uses MR and VR to connect a local operator with a remote expert 
for real-time remote interaction. We implemented the solution with Unity3D and C#. Our audio-video exchange 
protocol is built upon Web Real-Time Communications (WebRTC) library (WebRTC, n.d.). The solution consists 
of a MR client and a VR client, both based on the same application. The Photon Unity Networking (PUN 2) plugin 
is also used to allow the remote expert to share specific information with the local operator (Photon Unity 
Networking, n.d.). Fig. 1 shows the overall setup of the project. Our solution is developed using the OpenXR norm, 
allowing our solution to became cross-platform (OpenXR, 2016). In Fig. 1, the “BIM-DT” section represents a 
BIM-based DT. It consists of the shared 3D representation of the physical twin, a history database containing 
semantic data and the results of previous operations, and a sensor database, where all the data collected from the 
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physical twin is stored (see Fig. 1, red arrow). The BIM-DT also contains a simulation model, with which the VR 
and the MR user can interact if necessary to simulate specific situations or procedures. This uses data from both 
databases. 
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Remote Local 


Expert setup Operator setup 


Direct 
interaction 


BIM-DT " i 


1D 


~> Data exchange 


=>  Datacollected 


Fig. 1: Schematic representation of the proposed setup consisting of (left) remote expert setup, with VR device 
and computer, (right) local operator setup, with MR device and physical system, and (bottom center) the BIM- 
DT. 


The local operator side consist of the physical twin of the system (cf Fig. 1 “Physical Twin”), on which the 
maintenance is performed, and a Microsoft Hololens 2 (Microsoft, n.d.-a) to use the MR client of our solution (cf 
Fig. 1 “MR”), developed using the Mixed Reality Toolkit (MRTK) provided by Microsoft. The local operator will 
be referred to as the Operator for the remainder of this paper. The remote expert side consist of a Meta Quest 2, 
wired to a VR-ready laptop PC (Intel Core i9, 32GB RAM, Nvidia RTX 3080), running the VR client of our 
solution, through Oculus Link connection (cf Fig. 1 “VR”). The rendering capabilities of the PC can manage and 
host the network connection between the HMDs. The Windows Device Portal (WDP) web server can also be used 
by the remote expert to record the conversation with the Operator, using the Mixed Reality Capture function 
provided. The PC also host a node-dss server for the WebRTC connection between the VR and the MR clients. 
The remote expert is also provided with specific documentation on how the maintenance should be performed on 
the system. The remote expert will be referred to as the Expert for the remainder of this paper. The Expert and the 
Operator are located in different rooms in the same building during the collaboration. An audio-video exchange is 
provided between both clients using the WebRTC protocol, which provides peer-to-peer real-time audio and video 
communication for collaborative applications (see Fig. 3. (blue arrows for Operator, red arrows for Expert)). 


2.2 Information exchange paradigm 
2.2.1 3D representation 


A 3D model representation of the system is implemented. This model is loaded only on launch, for the VR client, 
or when a specific QR code is identified, for the MR client, using a specific SDK developed by Microsoft 
(Microsoft, 2022). Once the QR code is found, the 3D model is loaded in relation to its position. The model is 
shared between the MR and the VR client and contains the scripts allowing the exchange of information between 
the two clients, using the PUN plugin. This plugin provides us with a specific feature called Remote Procedure 
Calls (RPCs), allowing each client to call methods on remote clients in the same room. This feature has enabled 
us to set up an asynchronous interaction system for our solution. The PUN plugin also allows us to create avatars 
to represent both clients. The avatars used are composed of a white sphere with makeshift glasses, to inform the 
other client where each avatar is looking. This representation of the users allows for a better communication 
between them (Piumsomboon, Lee, et al., 2019). In our case, we have decided to use a God point of view situation, 
where the Expert’s avatar is on a higher Y-level than the Operator’s one, allowing the Expert to see where the 
Operator is placed in the physical space in relation with the physical system (Piumsomboon et al., 2017). 


2.2.2 Replica paradigm 


Our asynchronous interaction system is using the Replica method for interactions. Based on the concept of Voodoo 
dolls, brought by Pierce et al, this method consists of creating a reduced copy of a 3D model instead of a direct 
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interaction with it (Pierce et al., 1999). Once the copy has been created, any interaction with it is reproduced to 
scale on the initial model. In 2015, Oda et al. took up a similar method for exchanging real-time visual 
manipulations during remote assistance for maintenance procedures (Oda et al., 2015). 
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B MR {e) 
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Fig. 2: Schematic representation of our Replica interaction system 


Our interaction system is based on the same principle: while performing a direct interaction with the shared 3D 
model (Fig. 2 a), each client creates a Replica of the 3D model (Fig. 2 b). This Replica can be moved independently 
from the shared model (Fig. 2 c). Interactions (modified elements, annotations, colour changed...) carried out on 
the client’s Replica are specific to it, thus only its owner can see them (Fig. 2 d). If the client wants to share his 
modifications, he or she needs to Synchronize his Replica with the shared 3D model, as shown in Fig. 3 (green 
arrows). Once the synchronization is asked, the modifications of the client’s Replica are applied to the shared 
model (Fig. 2 e). If any annotation had already been added to the shared model, these are retained. If the shared 
model has already been modified and the request is made by the client identified as the Expert, its modification 
takes precedence. 
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Fig. 3: Representation of the interaction with the Replica (green) and the audio-video communication (blue & 
red) 


This method allows both clients to assess different modifications or add several annotations at once before sharing 
them with the other. It can also allow the client to check the information before sending it, thus avoiding the 
transmission of incorrect information. If the 3D representation is the digital representation of a Digital Twin, this 
method also allows both clients to use the DT’s simulation engine to observe the impact that a simulated event, 
such as specific manipulations, can have on the system. 


3. USER STUDY 


To evaluate our solution, we conducted a user study. Its major purpose is to evaluate the usability of our solution 
and the impact of the collaboration experience on the performance and resolution time on the Operator side. We 
have decided to make a comparison between two conditions: One using a standard audio-video call, called Tablet 
condition; the other using the MR client of our solution, called HMD condition. 
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3.1 Experimental protocol 


A total of 41 participants took part in the study, 12 women and 29 men. They were primarily students and teachers 
at our school. Participants were asked to sign up using an online form. Only participants with no prior knowledge 
of the machine were considered. Participants were arbitrarily assigned to one of the two conditions. A presentation 
of the experiment has been performed prior to commencing the session. For the Tablet condition, the Operator is 
invited to use a tablet Samsung Galaxy Tab A. The audio-video call is made via the Teams application (Microsoft, 
n.d.-b), through the use of a unique account. It has been decided not to use earphones or headphones during the 
call to simulate an on-site situation, in which the use of this type of device can be difficult due to hearing protection. 
For the HMD condition, the Operator is provided with the MR client of our solution. A 5-minute training is 
performed by the participants to familiarize themselves with the gesture recognition interaction system of the 
device. Prior to the experiment, to avoid any bias due to a lack of knowledge of the system, it has been decided 
that the Expert would be embodied by a unique actor. The Expert is already trained in the use of the VR client, 
which avoid any issue due to a misunderstanding of the client interaction system. 


Fig. 4: The Expert (a) is moving the valve 2V4 to the Operator on his Replica (b). The model is synchronized to 
show the information to the Operator (c). The Operator can then move the right valve (d). 


The Expert is provided with a unique inspection plan for both conditions. In the Tablet condition, the Expert 
provides vocal instructions to guide the Operator to locate the objects he is required to interact with, through 
detailing the relationships between them. This method proved to be more effective than using visual context (Teo 
et al., 2019). In the HMD condition, the Expert use both vocal guidance and the Replica paradigm to provide 
information to the Operator, as seen in Fig. 4. During the task performance, the video communication and the 
participants view were recorded using the in-built system of the device used (record system of Teams for the tablet 
condition, record system of Windows Device Portal for the MR condition (Karl-Bridge-Microsoft, 2023)). 


3.2 Tasks 


The scenario simulates an issue with the hot water delivery temperature obtained by the heat exchanger (see Fig. 
5). The system is composed of fourteen valves and two different heat exchangers: a shell-and-tube heat exchanger 
and a plate heat exchanger. This system was chosen for its versatility, as heat exchangers are often found in 
buildings (HVAC, plumbing systems...). In both conditions, the Operator is asked to call an Expert to help him 
identify the issue with the exchanger. Tasks include handling specific valves to alternate between a shell-and-tube 
heat exchanger and a plate heat exchanger, and to alternate between parallel-flow and counter-flow current, to 
change the efficiency of the heat exchange. The scenario used by the Expert is divided into two parts: Inspect the 
system and Initial State. Each part is divided in both Manipulation blocks, where the Operator is expected to 
interact with the system, and No manipulation blocks, where the Operator is invited to give specific information 
to the Expert through descriptions. The Manipulation blocks are divided into two types: “/-handed” and “‘2-handed” 
tasks. In the /-handed tasks, four operations must be performed, using one hand. In the 2-handed tasks, only two 
operations must be performed, but these tasks required to use both hands. Fig. 4 shows an example of interaction 
in the MR condition. The Expert uses the Replica paradigm to indicate the correct valve to handle (see Fig. 4 (a) 
& (b)). Once the valve highlighted, the Expert synchronize the system to update the shared 3D model and shows 
the information to the Operator (see Fig. 4 (c)). Then, the Operator can move the correct valve (see Fig. 4 (d)). 
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Fig. 5: Physical twin of the heat exchanger with names of the valves. 


After completing the Inspect the system part, the Operator is asked to inform the Expert of the hot water outlet 
temperature obtained. Once the Expert has explained the reason for the system’s temperature issue and what should 
be done to correct it, the Operator is asked to return the system to its initial state. In this part, the Expert can change 
the order in which the valves are handled in the “/-handed” blocks, to avoid a repetition bias with the first part. 
After completing the /nitial State part, the Expert summarizes the operations conducted by the Operator and the 
conclusion of their common inspection. Then, the Operator is invited to end the call with the Expert. 


3.3 Hypotheses and metrics 
In this user study, we stated the following hypotheses: 


H,: The performance time from the completion of the collaborative inspection is faster in the HMD condition. 
Hy: The number of errors is lower in the HMD condition. 


To verify our hypotheses, we have used specific metrics. Audio and video of the Operator viewpoint were recorded 
during each experiment for subsequent analysis of the interaction. A timer is started by the test conductor on the 
Operator side at the beginning of the call. For each task, breakpoints are recorded. Errors are also recorded 
whenever a wrong valve is identified or handled by the Operator. 


4. RESULTS 


Prior to the analysis of the results, we have decided to exclude two participants. One for the Tablet condition, 
where the participant spent most of the experimenting commenting on the relevance of the explanations given by 
the Expert, and one for the HMD condition, where the participant had difficulties understanding the purpose of the 
instructions given by the expert. The following results are thus obtained from 19 participants for the Tablet 
condition, and 20 participants for the HMD condition. We performed Shapiro-Wilk (SW) tests on all measurements. 
For the results non-normally distributed, we performed Mann-Whitney-Wilcoxon (MWW) test on all 
measurements. For the results normally distributed, we performed a one-way ANOVA test to compare the mean 
of the samples. 


4.1 Completion time 


We measure performance time required to complete each experiment. Once the analysis of the overall performance 
time of the experiment has been analysed, we carry out a detailed analysis of performance times for /-handed and 
2-handed manipulations, as well as for phases of the experiment where only vocal instructions were given by the 
Expert to the Operator on both conditions. 
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4.1.1 Global observations 


Fig. 6 shows that there is a clear difference in completion time in seconds between the two conditions. The SW 
test shows that both the Tablet condition (W=0.98, p-value=0.96) and the HMD condition (W=0.94, p- 
value=0.267) results follow a normal distribution. For the Tablet condition, participants took an average of 763.65 
seconds to complete the experiment (SD=76.80). The participants of the HMD condition only took an average of 
623.55 seconds to complete the experiment (SD=67.70). Because they follow a normal distribution, we use a one- 
way ANOVA to compare the results. The tests shows that the participants in the HMD condition were significantly 
faster than those in the Tablet condition (F (1,37) = 36.6, p-value < 10°). 


Total time to complete the inspection 
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Fig. 6: Total performance time for Tablet condition and for HMD condition 
4.1.2 By task (-handed vs 2-handed) 


Manipulation blocks are divided into two types of tasks. Fig. 7 (a) shows the time difference in seconds between 
both conditions for the /-handed tasks. Neither condition follows a normal distribution (Tablet: W=0.9, p- 
value=0.03; HMD: W=0.9, p-value=0.02). The Tablet participants spent an average of 193.26 seconds (SD=55.8) 
to identify and manipulate the valves, while the HMD participants only spent an average of 146.43 seconds 
(SD=26.4). A MWW test is performed and shows that participants using the HMD condition were significantly 
faster than the ones using the Tablet condition (W=45, p-value < 10%). 


Fig. 7 (b) shows the time difference between the Tablet condition and the HMD condition for the 2-handed tasks. 
The Tablet condition follows a normal distribution (W=0.9, p-value=0.06) for an average time spent of 146.7 
seconds (SD=36.1). The HMD condition follows a non-normal distribution (W=0.9, p-value=0.01) for an average 
time of 105.86 seconds (SD=28.4). Thus, we perform a MWW which shows that the HMD condition is also faster 
than the Tablet condition (W=49, p-value < 10“) when the Operator should use both his hands to interact with the 
physical system. 
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Fig. 7: Total time spent for (a) /-handed and (b) 2-handed manipulation. 


29 


4.2 Errors 


During the experiment, several types of error have been recorded. A Simple error is considered when the Operator 
make an incorrect identification of a valve indicated by the Expert, and a Critical error is considered when the 
Operator manipulate the incorrect valve. A Repetition error is considered when the Operator asks for the Expert 
to repeat the information already given. Simple and Repetition are only considered as one error, while a Critical 
error is considered as two errors, because considered the ones that can worsen the state of the system if performed. 
Table 1 summarizes these errors and their ponderation. 


Table 1: Total and average number of errors for each condition per error type. 


Total Average Total Average 
number number number number 
Type of errors 
for for forHMD for HMD 
Tablet Tablet 
Simple (x1) 49 2.45 3 0.15 
Critical (x2) 6 0.3 1 0.05 
Repetition (x1) 3 0.15 0 0 
Total with ponderation 64 2:9 5 0.2 


Table 1 shows a difference between the two conditions in terms of errors (58 total errors for Tablet condition 
(Average=3.37; SD=3.02) vs 4 total errors for HMD condition (Average=0.25; SD=0.64)). The SW test confirmed 
that neither condition follows a normal distribution (Tablet: W=0.9, p-value<0.03; HMD: W=0.4, p-value<10°’), 
thus allowing us to perform a MWW test. The result confirms that there is a significant difference between the 
total number of errors for both conditions (W=277, p-value <10°). 


In the Manipulation blocks, the Operator was invited to use only one or both of his hands to manipulate the valves. 
Table 2 shows the total and average number of errors in /-handed and 2-handed operations. For the /-handed 
operations, we observe that the Tablet condition has a mean of 2.21 errors per participant (SD=2.80), while the 
HMD has a mean of only 0.05 (SD=0.23). A SW test performed on both conditions shows that neither follow a 
normal distribution (Tablet: W=0.8, p-value<10°; HMD: W=0.3, p-value<10°%). Thus, we perform a MWW test 
that shows that there is significantly less errors performed on the HMD condition (W=314; p-value < 10%). 


For the 2-handed operations, the Tablet condition has a mean of 1.16 errors per participant (SD=1.21) while the 
HMD condition has a mean of only 0.2 (SD=0.523). As for the /-handed manipulations, neither condition follows 
a normal distribution (Tablet: W=0.8, p-value < 10°; HMD: W=0.4, p-value<10°). Then, we perform a MWW test 
to confirm that there is significantly less errors for the 2-handed operations from the HMD participants (W=279, 
p-value<107). 


Table 2: Total and average number of errors for /-handed and 2-handed operations per condition. 


Total number Average Total Average 
Condition for 1-handed number for number for number for 
1-handed 2-handed 2-handed 


Tablet 42 2.21 22 1.16 


HMD 1 0.05 4 0.2 
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5. DISCUSSION 


The overall results (Fig. 1) demonstrate a significant difference in performance time, supporting our first 
hypothesis (H7). In comparison to the Tablet condition, the HMD condition is 18,35% faster. We deeply examined 
both kind of manipulation performed by the participants, /-handed and 2-handed manipulations, separately to 
establish whether kind of manipulation performance is affected by our solution. We see a significant reduction in 
the amount of time needed to locate and operate the valves (24,24% for the /-handed, 27,84% for the 2-handed). 
This reduction can be linked to the obligation to put the tablet down for 2-handed operations, which may take 
some time. In the literature, Ladwig et al. found a similar reduction of 30% to locate the correct elements to operate 
using physical LEDs in comparison with only vocal exchange (Ladwig, 2019). 


In a similar way, the effect on the assistance provided to prevent choosing the wrong valve may be observed. Both 
for the /-handed and the 2-handed manipulations, we see a much-decreased rate of errors in the HMD condition 
(see Table 2). Overall, participants using our solution made 92,58% fewer errors than those using the Tablet 
condition, supporting our second hypothesis (H2). As shown in Table 1, a more thorough examination of the errors 
made reveals that there were 93,88 % fewer identification (Simple) errors and 83,33 % fewer manipulation 
(Critical) errors. These results are similar to the reduction in errors (89%) observed by Ladwig et al. when using 
physical LEDs (Ladwig, 2019). Thus, our approach, which uses virtual animations as indicators, makes it possible 
to avoid misunderstandings and misidentifications during remote support while preventing the need to modify the 
physical system to support the collaboration. 


6. CONCLUSION AND FUTURE WORK 


In this paper, we propose a new solution for remote collaboration using a MR client for a local operator and a VR 
client for a remote expert. This solution aims to help improve remote collaboration during maintenance procedures, 
using both video communication and 3D models to interact with each other. In our research, we propose a solution 
allowing both synchronous and asynchronous collaboration using the Replica paradigm. We performed a use case 
to compare our solution with a standard video communication using a video conference program. We stated two 
main hypotheses that we wanted to investigate about the performance (H7) and the number of errors (H2) of the 
participants. 


About the performance time, there was a significant effect of our solution on the total completion time of the 
participants, hence supporting hypothesis H;, that the time required to complete the maintenance was decreased 
by giving the Operator contextual visual aids. The fact that /-handed manipulations were also quicker proved that 
our solution has a positive impact on improving the assistance to identify the valves to manipulate, even though 
the improvement of 2-handed manipulations was expected due to the free hands provided by the HMD condition. 
In term of identification and manipulation errors, we see a significant reduction for the participants using the MR 
client. This support hypotheses H2 that contextual visual aids and using a hands-free device facilitate the 
Operator’s ability to identify and manipulate the equipment they must operate. 


In our use case, the Expert’s avatar was not on the same height level as the Operator. It might be interesting to 
study the impact that the presence of this avatar might have on the guidance provided by the Expert. Some studies 
have already observed a significant impact of an avatar presence, but without direct interaction of the remote expert 
with the 3D environment (Piumsomboon, Lee, et al., 2019; Wang et al., 2023). 


Furthermore, only the Operator side was studied during our use case. It will be necessary to carry out a usability 
study on the VR client of our solution. The system used for our use case didn’t have any usable sensors, so further 
experiments should be performed to evaluate the simulation model of our BIM-based DT and its impact on the 
collaboration. The simulation could be used by the Expert to perform diagnostic simulations and to test various 
maintenance operations before guiding the Operator, while the Operator could use the simulation model on its 
Replica to perform its own diagnostic simulation, and then compare it to the Exper?’s results. 
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ABSTRACT: Many recent developments in mixed reality applications are exploited for research on improving 
training in the construction industry. While immersive technologies offer indisputable advantages over classic 
paper- or multi-media-based training material, access to this kind of technology is still very limited in the academic 
world and even less widespread in industry. In this paper, the authors follow the current trend of creating low- 
threshold micro-learning nuggets, which are easily consumable on mobile devices but can be accessed in every 
web browser. This is essential to reach the construction trade workforces, which for the most part will own a smart 
or mobile device, but neither specialized equipment, nor will there be time or patience for a lengthy setup phase 
before learning content consumption. The learning content aims to give construction workers a clear vision of 
what some of the fundamental components of a sustainable construction site should look like and what role they 
play in achieving the said vision. The learning content revolves around the initial idea of DGNB certification 
(German: German Sustainable Building Council), waste management, certification of construction wood, 
handling of harmful substances and chemicals and some general health and safety regulations that impact the 
emission of dust, noise and vibration. The paper describes the general approach of the planning, orchestration of 
learning material, development of the learning nugget, and deployment, as well as a study for acceptance and user 
experience. 


KEYWORDS: DGNB, continuous education and training, micro-learning nuggets, responsible consumption and 
production, smart and mobile devices, sustainable construction, ubiquitous learning, workforce. 


1. INTRODUCTION 


The construction industry plays an important part of the European economy, employing over 13 million people 
(6.6% of the EU employment) (CEDEFOP, 2023). The sector’s related effects in other industries are known, for 
decades, to be extensive as the entire value chain from sourcing over to fabrication and final installation, 
maintenance and operation, and reuse of products consume enormous amounts of raw materials and energy. As 
such, 13.5 million other jobs in supplier industries are directly impacted by construction in Europe. In brief, the 
entire life-cycle chain in the built environment involves, to name a few resources, substantial plant environments, 
complex products, purposed machinery, and specialized trades with skilled personnel across industry sectors. 


While construction is generally seen as a catalyst for stability in economies around the world, it is also viewed in 
the public perception as one of the most important areas for the green transition, as it is responsible in large amounts 
for energy consumption contributing to waste and emissions (Teizer and Wandahl, 2022). Yet, in Denmark, the 
Renovation Wave and the New European Bauhaus, supporting building and infrastructure facilities in the entire of 
the Europe Union (EU) in becoming smarter and greener, construction significantly contributes to the green 
transition through large-scale installations of wind farms (EU 2020b). However, Denmark and other countries have 
realized that sustainable construction plays a significant role in achieving the 30% emission-reduction goal by 
2030 and becoming climate neutral by 2050 (DEA, 2023). 


While construction remains one of the least digitalized sectors in the EU, new digital technologies will shape it 
with increasing intensity. Demand for highly qualified workers is growing steadily, as will the skill needs for 
medium- to low-skilled occupations to learn and practice sustainable goals (UN 2023b). Job surveys, conducted 
for example by CEDFOP (2023), highlighted substantial training needs for construction workers. Many workers’ 
skills are not well-utilized, which is especially true of construction’s significant migrant workforce. Yet, new and 
far more complex rules and regulations in the built environment demanding ever stricter compliance of 
construction materials or products, or their integration into installation and maintenance processes, make it even 
more challenging for the workforce to keep up with state-of-the-art knowledge and practices. All reasons above 
call for additional learning tools that can create or attract and retain skilled and tech-savvy personnel. 
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The following research questions indicate the objectives of this paper: 


- While policy makers often force change through top-down approaches, what bottom-up initiatives can be 
taken to stimulate change by sustainable behavior? 

- Who are the suitable recipients and what are the key focus areas of learning sustainable goals? 

- How can awareness among the construction and real estate sector’s workforce be created with a tool that 
is simply to use and still actively engaging them in a learning exercise? 


2. BACKGROUND 
2.1 Policies on climate forcers 


The construction and real estate sector is one of the main sources of emissions globally. Typical new construction 
and renovation processes use various types of equipment, numerous sophisticated raw material or product 
resources, and a workforce that is highly specialized in their trade discipline. Yet, combined use in inefficient and 
wasteful building or operation processes, contributes significantly to different forms of waste and pollution, and 
irreversible consumption of quite significant amounts of energy (Andrade and Teizer, 2023). In the European 
Union (EU) alone, the sector is responsible for over 35% of the EU’s total waste generation and an estimated 5- 
12% of total Greenhouse Gas emissions (EC, 2022). On a global mission, according to DGNB (2023), a German 
non-for-profit organization named “Deutsche Gesellschaft für Nachhaltiges Bauen e.V.” (English: German 
Sustainable Building Council), there is further potential for action in the construction and real estate sector as it is 
also responsible for: 


- 30% of worldwide resources consumption, 
- 40% of worldwide energy consumption, and 
- More than 30% of worldwide carbon emissions. 


Consequently, the urgent need for greener construction is being addressed by several world climate agendas such 
as the European Green Deal, which aims for climate neutrality by 2050 through the green and digital 
transformation of EU sectors (EC, 2020a). 


2.2 Practices in the construction and real estate sector 


Three selected examples from Europe, Germany and Denmark point to the significance of the problem and show 
that the potential impact of bottom-up approaches, for instance, e-learning tools, can be high. 


Example from Europe: According to official reports, only 11% of the existing building stock in European Union 
undergoes some level of renovation each year (EC, 2020b). However, very rarely (1%, weighted annual energy 
renovation rate) do building renovation works address the energy performance of buildings, and worse, only 0.2% 
of building renovation projects reduce their energy consumption by 60%. At this pace, the report concludes that 
“cutting carbon emissions from the building sector to net-zero would require centuries.” It is time to act. 


Example from Denmark: Avoiding wasteful and energy-consuming construction, less than 3% of the Danish 
building stock is built newly on an annual basis. The vast majority of Danish projects in the built environment are 
already renovation efforts to make its buildings more sustainable. Moreover, most of its existing building stock 
has reached the age for renovation: Over 80% was built before 1990 and over 75% of the total floor area can be 
attributed to residential buildings (Statisics Denmark, 2019; Wittchen and Kragh, 2016). Additionally, Danish 
society, and presumably societies around the world, is becoming more and more aware of the negative 
environmental impact of our living environments. Almost 50% of the annual energy consumption (Nordic Energy 
Research, 2023) and nearly 12% of annual CO2-emissions can be attributed to households (Statistics Denmark, 
2023). By far, the majority of energy usage in a building takes place during its operational lifetime. In this 
combination lies a vast challenge how can we improve the performance of our building stock on a scale that meets 
the now urgent and strict energy requirements? 


Example from Germany: Construction and demolition in Germany in 2017 caused 220 million tons of waste, 53% 
of all industry sectors’ combined waste (SB, 2023). A fraction of the waste, while its raw material components are 
valued highly, is typically recycled (Circle Economy, 2022). Note, while this number includes the demolition of 
bulk material in road infrastructure, the principles of circularity in the construction and real estate sector yet have 
to arrive in full swing. Subsequently, as part of the Sustainable Development Goals (SDG 12) (UN, 2023), the 
sector can benefit from responsible consumption and production. The proposed learning tool addresses some of 
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these goals, see italic text: 


- Sustainable management of natural resources, recycling and reuse 
- Waste reduction 

- Avoidance of hazardous substances to air, water and soil 

- Reuse of components 

- Design for disassembly 

- Use of natural resources 

- Awareness for sustainable development 


2.3 Sustainability certificates and target group 


While the World Green Building Council has many initiatives around the world, for example, LEED and Green 
Star, DGNB was founded in 2007 and is Europe’s biggest network for achieving sustainable buildings. It is a non- 
profit organization (NGO) that aims to identify and promote solutions for the planning, execution and use of 
buildings and communities in order to achieve a sustainable future (DGNB, 2020). Pooling and sharing knowledge, 
translating sustainability into practice, and sensitizing the general public are its key objectives. As of February 26, 
2020, DGNB has more than 2100 members that are comprised of 20% architects/planners, 16% engineers, 21% 
manufacturers, 17% others, 10% project managers/consultants, 9% investors/developers, and 7% building 
contractors (DGNB, 2020). DGNB, like the other initiatives mentioned, offers expensive certification courses for 
a wide range of users, incl. practitioners, consultants, and even students. 


The representation of the building contractors in DGNB, and moreover, the large number of personnel it employs, 
including but not limited to laborers and site superintendents, and supposed to be more than any of DGNB’s 
membership, would be the ideal and predominant use group of the proposed learning tool. In Europe, the 
construction sector as a share of total employment in the European Economic Area varies by country between 4- 
10% (Statista, 2023). 


2.4 E-learning-content creation in construction and engineering pedagogy 


The previous sections have shown, policy and practice differ widely. While behavioral change is one approach to 
create awareness, yet, hardly any learning tools exist that actively engage the personnel at the workplace. While 
many methods exist to engage the construction workforce with learning (Wolf et al., 2022), E-learning (EL), 
defined as conducting learning via electronic media, typically on the Internet, depends on the self-motivation of 
individuals to study effectively. 


In the context to construction applications, the effectiveness of EL has been widely studied in safety training and 
construction management (Lee et al., 2014). Ho and Dzeng (2010) reported a positive impact on Taiwanese labor. 
Bokor and Hajdu (2014) focused in creating interactive content, incl. use of videos, to facilitate better 
understanding. Likewise, Lu et al. (2023) investigated the learning curves of participants in modular construction. 
Kim and Santiago (2005) focused on the instructional development process through the impact of educational 
technology. Clevenger and Ozbek (2013) and El-Adaway et al. (2014) both addressed service-learning concerning 
evaluating sustainability competences in the engineering curricula. However, their work made not much use of EL. 
Similarly did Love et al. (2015), while examining collective learning by coaching, not focus on sophisticated 
technical aids. 


A few other teaching methods are reviewed rather in brief to let readers understand the differences what they offer: 


- Game-based learning (GBL) is where game characteristics and principles are embedded within learning 
activities that reward and motivate the participant to think critically. In construction, Oo et al. (2016) 
focused a game on cost estimation and bidding and Sacks et al. (2007) on reducing waste in construction 
by applying lean principles. Teizer et al. (2020) utilized Internet of Things (IoT) technology in a serious 
game for the purpose of identifying and eliminating waste during the construction operations. Jacobsen 
et al. (2021) extended their work to multi-user GBL-experience in Virtual Reality (VR). Few studies exist 
to GBL with regards to sustainable design and LEED or equivalent concepts; Dib et al. (2012) is one of 
these. However, Dib et al. (2013), Ayer et al. (2016), Castronova et al.’s study (2017), Dancz et al. (2017), 
Clark et al. (2021), most of the studies use university students for evaluation. A late criticism, practitioners 
should be used instead, was raised by Adami et al. (2023). 

- Problem-based learning (PBL) uses complex real-world problems as a vehicle to promote student 
learning of concepts and principles as opposed to direct presentation of facts and concepts. It stimulates 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


finding and evaluating research materials, critical thinking skills, problem-solving abilities, and 
communication skills. PBL is often practiced in the format of groups, as seen in architectural studios and 
construction management courses at universities (Williams and Pender, 2002), and life-long learning 
workshop exercises or seminar series in industry (Duch et al, 2001). 

- Mobile learning (ML) is education or training conducted by means of portable computing devices such 
as smartphones or tablet computers. Among studies investigating m-learning technology acceptance (Al- 
Rahmi et al., 2021), Wolf et al. (2023) investigated construction product quality by utilizing mobile 
Augmented Reality (AR) technology. 

-  Life-long learning (LLL) is a formal or informal approach to personal or professional learning that is 
continuous and self-motivated. Froehle et al. (2022) studied civil engineering skills development in LL 
using semi-structured interviews. Gao et al. (2022) explored the drivers and barriers of LL in construction, 
Salajan and Roumell (2021) outlined a holistic view of LL for European learning opportunities and 
promoted clearer links across educational pathways and sectors, incl. 

- Micro-learning and Nuggets (MLN) focus on learning by creating concise and bite-sized chunks of 
information. Nugget learning is a subset of micro-learning that focuses on the development of personal 
nuggets or mini-lessons at the end of each unit (Ploder et al., 2021). While recent research focuses on the 
emergence of technology for MLN and how it can perfectly address the next-generation workforce’s 
learning needs (Nanjappa et al., 2022), delivery of MLN in the context of construction applications, 
including sustainability, has yet to be explored in greater detail. 


Several other studies in the architecture and civil engineering domain focused on the effectiveness of education, 
for example, Vorster (2010) and Mostafavi et al. (2013). Yet, they fall short in explaining the role of advanced 
technology in education and training. These limitations indicate a vast, and still unexplored, opportunity for 
leveraging technology in learning, as recently shown by Wang et al. (2020) and Wolf et al. (2022). 


3. METHODOLOGY 


Based on the concepts and related work mentioned in the background section, the methodology section introduces 
the exact goals and requirements for the training application, was well as the theory behind both the learning and 
gamification content. Figure 1 shows the general approach, which is derived from the common SCRUM practice 
in software or product development. 
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Fig. 1: Development approach for the training application. 


The motivation for the training application is based on the needs of stakeholders from the Danish construction 
sector. A central project coordinator assures the alignment of goals and requirements, as well as providing the 
solution to the stakeholders after the development. The vision and storyboard are developed with the academic 
partners after which the iterative development itself starts. A feature backlog is derived from the story board, with 
each feature being developed as a singular entity. Once enough features are finished, the next version of the training 
application is released for review with the project coordinator, who keeps contact with the construction sector 
stakeholders. This loop is repeated until the feature backlog is empty so that every feature from the vision is 
implemented to sufficient standards. After that, die release version of the application is compiled and provided to 
the project coordinator. 
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3.1 Goals and requirements 


The goal is to create a learning application that is easily consumable on mobile devices but can be accessed in 
every web browser if need be. Due to the ubiquitous availability of smart mobile devices and the ease of use of 
scanning i.e. QR codes, the mobile view is the favorable focus. The target audience is construction workers with 
basic training and at least prior experience regarding waste sorting on a construction site. 


The requirements were to create a fun training, uses gamification elements and transports some core rules for 
sustainable construction. The five main talking points are: Waste sorting on the construction site, certified wood 
use, handling of hazardous substances, health and safety regarding dust, noise and vibration exposure and the 
reduction of energy consumption. The story-board for the training was developed in a virtual whiteboard by both 
construction experts and academic staff in an evolutionary manner, iterating ideation, creation and feedback for 
each section after the initial story development. There is no integration into an existing Learning Management 
System, no user management or login required to use the training, no persistent data storage, especially not on 
personal information and the final product should be hosted on any regular webserver without special setup. 


3.2 Content and structure 


The learning content itself is very limited in this context, as the target audience should have prior knowledge in 
most fields touched upon in this training. It is assumed that the training acts more like a refresher of that long-term 
knowledge and a reminder of the importance of the acts of each individual. 


To minimize the time needed to read texts on the smart device, each section is introduced by an animated character 
video-narrated by a voice actor. The following take-away messages and key points were chosen for the training. 


3.2.1 Introduction 


Awareness for the impact of the way most industrialized nations work is one of the main goals of this training 
application. The introduction therefore lays the foundation by stating that, if scaling Denmark’s energy and 
resource consumption up to the world population, the world’s yearly available raw resources would be consumed 
420%, or within weeks instead of a full year. Therefore, the goal is to become generally better at reusing, recycling, 
and recovering, in order to reduce the consumption of raw materials, in this case in the construction sector 
specifically. As stated earlier, the DGNB certification is a comprehensive tool that creates guidelines for a 
building's environmental, economic, social, and technical quality. Many clients are demanding DGNB-certificated 
buildings and it is therefore important that craftsmen not only follow those guidelines but understand their impact. 
Therefore, the introduction offers a perspective of what can be done in everyday life on the construction site. 


3.2.2 Waste sorting 


Denmark’s goal for 2030 is to re-use, recycle and/or recover a minimum of 70% (weight percent) of raw waste 
materials. In order to reach this goal, there is a need to ensure efficient and correct material sorting on the 
construction site. With efficient sorting and reuse, the reduction of the currently excessive consumption is more 
easily achieved. While providing some background with the animated character videos, the waste sorting section 
should offer an active exercise in waste sorting as depicted in Figure 2. While most sorting tasks of raw materials 
seem obvious (metal into the metal container, wood into the wood container, etc.), the edge cases are the important 
ones to reduce waste and/or problems further down the recycling path. For example, metal buckets with leftover 
paint belong in the correct waste bin for harmful chemicals, wet sheetrock can not be recycled easily and should 
be separated from dry sheetrock and reinforced concrete belongs to regular concrete and not to general metal. 


3.2.3 Certified wood 


For valid certification of a building, every raw material has to be certified too. It is up to the craftsmen to validate 
those certifications on the material that arrives at the construction site. For the wood sector, Denmark generally 
uses two internationally recognized systems for FCK and PS3 from forests with Skovbo for the benefit of present 
and future generations. This quiz offered in this section should put the user into the perspective of a worker tasked 
with receiving a batch of wood. Before signing the delivery note, the user must identify the correct certification 
marks or decline the delivery completely. This step in particular is quite harsh in reality and awareness of the 
importance of correct certification needs to be established as the “higher goal” instead of “just keep working, it’s 
just wood”. 
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— Question M —_> Question N > 


Fig. 2: Sktech of a dragg and drop quiz for waste sorting. 
3.2.4 Hazardous substances 


While there is great value in being able to reuse materials, if the materials used on the construction site contain 
chemicals, they usually cannot be used in other contexts where health factors play a greater role. Awareness of the 
most common harmful substances is essential, as well as which substances are suitable for use on the sustainable 
construction site. There is a considerable number of valid certifications that can be found on the chemical’s 
containers and, once again, need to be recognized by the skilled craftsmen. The quiz in this section demands the 
user to judge the sustainability of certain chemicals, like paint for example, based on the packaging and once again 
to decide whether to accept or decline its use on the construction site. 


3.2.5 Dust, noise and vibration 


Health and safety play a major role on the construction site. Sustainability also encompasses a safe working 
environment and the awareness for dangers is one of the most crucial factors in preventing accidents. Especially 
the awareness of the accumulating severity of seemingly minor hazards like exposure to dust, noise, and vibration 
are put into focus in this section. Not only is the protection relevant for the workers conducting the operations but 
also for their coworkers and the surrounding neighborhood. The importance of reducing the noise, dust, and 
vibration at the source as much as possible and the awareness of restrictions regarding how much and when 
workers are allowed to apply the mentioned procedures are mediated with easy-to-follow videos and examples. 


3.2.6 Energy consumption 


Energy consumption is a key metric in every sustainability initiative. In the future, the EU taxonomy will include 
technical audit criteria for a wide range of activities related to energy supply and consumption on- and off-site 
construction. It is therefore important that craftsmen prepare for their future tasks, especially since more equipment 
and machines, incl. hand-held power tools become electric and battery-powered. Likewise, fossil fuel consumption 
is to be reduced, for example, idling vehicles present a significant waste and contribute to poor productivity. 
Moreover, excessive energy use to heat temporary office containers plays a critical role during late fall, entire 
winter, and early spring times, and as such, does cooling during summer times. Lighting of the construction site to 
ensure sufficiently safe walking pathways and workplaces, or providing a pleasant work desk atmosphere also 
contribute to energy consumption. While the implementation of Light Emitting Diodes (LED) have substantially 
reduced the energy needs, yet not all construction sites make use of it, and those who do, can either optimize their 
use and may at least need to become familiar with the appropriate recycling guidelines. 


4. IMPLEMENTATION 


In the following paragraph, the tools used to create the low-threshold training are presented, followed by the 
detailed description of the created content. The main editor for the created training is Microsoft PowerPoint 365, 
which is enhanced by the use of an E-Learning content management addon called iSpring Max. The latter allows 
the creation of interactive content and the compilation of the final product into HTMLS plus JavaScript, to be 
consumed with both mobile devices and PCs. The training could also be compiled into a so-called SCORM 
(Sharable Content Object Reference Model) package, so that it may be imported into regular learning management 
systems (LMS) as a separate activity in a more extensive learning path or e-learning course. When deployed as 
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HTMLS, the hosting is identical to any static website, meaning a regular webserver will suffice for hosting and 
displaying the training. 


For media creation, the assortment of applications of the Adobe Creative Cloud were used, most notably the Adobe 
Character Animator, which allowed the creation of the animated avatar and the voice acting to be conducted 
separately. The created avatar is depicted in Figure 3, showing some of the animations and the active lip sync. 


Fig. 3: The created avatar as used in the Character Animator. 


The avatar was chosen based on the stakeholders wishes, but can be switched out very easily if need be, while 
keeping the animations, triggers and the eye- and body-movement. When switching out the graphical 
representation, usually, the audio changes too. In that case, the lip-synchronization can be recalculated to match 
the graphic and audio representations. 


Figure 4 gives an overview of the structure and the logic implemented in the training application. 
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The initial introduction is done with a splash screen, followed by an avatar video. A splash screen is omitted in the 
sections, as an avatar video represents both the switch to another section, as well as providing background 
information and an introduction to the topic. Quizzes in the following description are all uniform in their way of 
providing feedback and explanation right after confirming in the answer and offering a second try, in case the given 
answer was wrong. After a second wrong answer, the question is marked as not successful. Each quiz only allows 
a single question to be answered not successfully, with unlimited attempts for each quiz. When finishing a quiz 
successfully, the next section is unlocked. 


Fig. 4: Structure of the training and implemented quiz logic. 
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Fig. 5: Training challenges: (a) waste sorting, (b) certified wood, (c) hazardous substances, (d) dust, noise and 
vibration, and (e) energy consumption. 


4.1 Waste sorting 


After the avatar video, the trainees are presented with the first quiz (Figure 5a). The quiz focuses the correct 
assignment of material to recycling bins or containers, with narration at the start of the quiz to clarify task and 
functionality. There are eight questions in the quiz, which are simply solved by dragging and dropping the waste 
to the correct bin. 


4.2 Certified wood 


After the avatar video, the trainees are presented with a real-world video, showing a truck delivering construction 
wood (Figure 5b). The quiz then revolves around the identification of relevant certifications. Several life-like 
delivery documents are presented and the trainees must choose to either accept or decline the delivery, based on 
the provided certification. 
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4.3 Hazardous substances 


After the avatar video, the trainees are presented with the third quiz (Figure 5c). There are six different typical 
chemicals, which are regularly used on construction sites, and the task is to check for the correct certification. 


4.4 Dust, noise and vibration 


After the avatar video, the trainees are presented with two real-world videos and instructions with narration 
regarding health and safety regulations (Figure 5d). As Health and Safety Trainings are regularly conducted, there 
is no quiz in this section. 


4.5 Energy consumption 


After the avatar video, the trainees are presented with two real-world videos and instructions with narration 
regarding health and safety regulations (Figure 5e). As health and safety trainings are regularly conducted, there 
is no quiz in this section. 


4.6 Final 


The final of the training is a quiz of four questions with a summary of the training at the end. 


5. STUDY AND PRELIMINARY RESULTS 


According to studies of Wolf et al. (2022) and Adami et al. (2023) practitioners as participants that evaluate the 
application matter. While the evaluation of the testing with an initial group of dozen practitioners is pending, 
preliminary comments that both workers and construction site staff left with the authors indicate that the developed 
application is technically sound to communicate the relevant content to the participants. The participants claimed 
to have known most, but not all of the content, thus receiving some value from using the application. While neither 
personal data about the participants nor IP-addresses of their smart devices accessing the application were collected, 
completion rate and time spent when navigating within application was recorded. Preliminary data indicates that 
participants stay within the envisioned time span of less than 15 minutes to complete the application. Abortions 
before ending the application were very low, probably due to the application still being novel. Yet, few participants 
mentioned that additional or variations of the examples within each module could help make it more attractive or 
revisit in the future or as part of continuous training short courses that they also recommended. One participant 
expects that repetitive use of it “makes some of us [workers] change thoughtless behavior”. Yet, the authors 
recommend the application for extensive testing with structured evaluation methods, utilizing, for example, 
anonymized pre- and exit surveys and/or interviews, and, perhaps, accompanied by behavioral observations of 
workers on the construction site. In addition, a verbal review meeting with site management staff who also acted 
as participants, revealed that demographic developments in the construction industry need to consider as well. 
While smart and mobile devices offers a communication style that attracts younger workforce and subsequently 
increases the chance that workers accept to participate, offering multiple international languages prevalent in the 
target country might be required, especially for migrating workforces that are not familiar with the local language. 
Likewise, the dependence on a voice actor while listen to the application may require a quiet space or headphones 
to operate it. Overall, preliminary results demonstrate technical feasibility and the workforce seems to accept the 
delivery method. 


6. CONCLUSION 


The tools mentioned in the implementation section allowed for a rather quick implementation of the desired 
training. While the technical and content goals were met, several quality-of-life requirements, which came up 
during the implementation process, could not be implemented due to restrictions in the frameworks, applications 
or time constraints. Most revolved around the exact depiction of feedback in the different quizzes or certain 
behavior regarding user interaction. Some of the preliminary results to a study showed that technology to create 
an effective learning nugget for achieving sustainability goals in construction exists. This can be used to counter 
the general trends in the construction workforce that can be named: significant labor shortage in certain trades, 
often replenished by migrant personnel, joining construction initially with limited skills or awareness. Future work 
will encompass the creation of multilingual trainings in the now focused field, as well as the creation of a larger 
scope training, based on multiple micro-learning nuggets, which will allow more room for learning content, media 
and assessment of learning outcomes. 
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ABSTRACT: Accurate process planning is essential for successfully implementing offsite construction projects. 
New technologies, such as virtual reality (VR), have been proposed as potential proactive solutions that allow 
users to experience and train on OSC processes to ensure safety and efficiency in an immersive environment. 
However, current VR applications in OSC projects (VR-OSC) problems are limited to residential projects and 
target single phases of the OSC implementation. This study proposes a VR framework to train participants on 
modular bridge construction processes. The developed model comprises several OSC phases, such as fabrication, 
transportation, and assembly. Furthermore, the study explores the use of collaborative platforms that can be 
associated with the VR model to ease the model and the developed scenes. The model is tested on a sample of 
participants that evaluated the performance of the model and provided areas of improvement. The results showed 
the capabilities of the model in providing an immersive experience for participants and connecting different phases 
of the OSC projects. Also, the results show that the experiment length and complex controlling buttons are among 
the areas of improvement. The developed model is expected to facilitate safe and efficient training for complex 
OSC projects. 


KEYWORDS: Offsite Construction, Building Information Modeling, Virtual Reality, Game Engines 


1. INTRODUCTION 


As the conventional construction method has been argued to provide less productivity compared to other industries, 
offsite manufacturing and modularization solutions have been proposed to enhance the efficiency of the 
construction industry (Alsakka et al., 2023; Hussein et al., 2022). Modular construction, as a part of offsite 
construction (OSC), is the process of assembling fully finished modules in the factory, transporting them to the 
job site, and installing them in the correct locations (Assaf et al., 2023). However, the efficient adoption of modular 
construction requires a high level of information technology and digitalization (Ezzeddine & Garcia de Soto, 2021). 
Hence, various digital solutions and technologies have been adopted in modular construction methods, including 
computer vision (Alsakka et al., 2023), blockchain (Wu et al., 2022), Internet of Things (IoT) (Li et al., 2022), and 
immersive technologies (Zhang et al., 2023). 


Virtual reality (VR), as a part of immersive technologies, has been used by many researchers to enhance the 
implementation of OSC techniques. VR can be defined as a simulation of the real world in which participants can 
interact with the virtual assets and experience different degrees of immersion in the simulated environment (Abbas 
et al., 2019; Alrehaili & Al Osman, 2022). These degrees of virtual immersion include non-immersive, semi- 
immersive, and fully immersive VR models (Zhang & Pan, 2021). The virtual scenes in the simulated 
environments are usually done using game engines (Olofsson, 2018). Through the use of game engines, the 
interaction rules and conditions are defined using coding scripts (Kumar et al., 2011). This combination of VR and 
game engines have been used in several research domains, such as health and therapy treatment (Mevlevioglu et 
al., 2022), the food and shopping industry (Gil-Lopez et al., 2023), and the construction industry (Boton, 2018). 


As the combination of VR and game engines is a promising approach in many industries, OSC also benefited from 
it. The VR-OSC research has gained massive attention in recent years. For instance, Zhang and Pan (2021) have 
proposed a VR model of tower crane location planning for modular construction projects. The model was created 
using the Unity3D game engine and was supported by a graphical user interface (GUI) to facilitate the selection 
of crane types, layout plans, and camera views. Similarly, Shringi et al. (2023) aimed to develop a VR model to 
train operators on crane operations in offsite construction. Their model included a safety index that was calculated 
based on penalties applied to each of the identified risks in the scene. In safety and ergonomics analysis, Dias 
Barkokebas et al. (2022) combined a VR model with a motion capture system to evaluate workers’ ergonomics in 
OSC factories. The VR model was developed using Unreal Engine, and the data collected through the motion 
capture technology was analyzed using rapid entire body assessment (REBA) and rapid upper limb assessment 
(RULA) methods. Similarly, Joshi et al. (2021) proposed a VR safety training model for precast factories for 
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employees. The model was developed using Unity3D and was used to train the employees on the personal 
proactive equipment (PPE), stressing processes, and safety measures in the factory. Inyang et al. (2012) developed 
a VR model to assess safety risks in panelized construction factories. Their model evaluated different layouts of 
the manufacturing facility, and ergonomic risks were evaluated for each of the selected layouts. 


For educational purposes, VR has proven to be an effective tool in educating participants and students on OSC 
processes (Eiris et al., 2020). For instance, Beh et al. (2022) introduced an educational model to train students on 
inspection activities of the OSC projects. Game engines were used in their model to develop educational game 
scenes, including fire inspections, leak inspections, and rain system inspections. Similarly, Sampaio and Viana 
(2014) created a VR model to educate students on the processes included in the prefabricated bridges. Furthermore, 
VR-OSC research also targeted the use of VR in evaluating design alternatives and factories/site layouts. For 
instance, Zhang et al. (2006) have developed a VR model to help participants in evaluating design alternatives in 
prefabricated construction. Their model assessed many criteria, including production time and cost. Zhang et al. 
(2021) developed a virtual environment to improve the generated value to the customer in offsite construction 
facilities. Many other applications of VR-OSC research were found in the literature, including collaborative digital 
platforms in modular construction projects (Ezzeddine & Garcia de Soto, 2021) and the use of VR in circular 
economy of OSC projects (O’Grady et al., 2021). 


Despite the contribution provided by the previous studies, they lacked the following aspects: 1) all of the mentioned 
studies have considered the use of VR in OSC residential projects. OSC, especially modular construction, can be 
employed in many project types, such as bridge construction, which is usually referred to as modular bridges 
(Lechner et al., 2021). Modular bridge construction includes fabricating heavy steel and concrete modules in 
manufacturing facilities and shipping them to the bridge to be aligned and installed (Xiangmin & Dewei, 2021); 
2) most of the presented studies have tackled the OSC life cycle from a single perspective, such as onsite 
installation (Zhang & Pan, 2020) and manufacturing phase (Dias Barkokebas & Li, 2021); 3) the developed models 
have limited accessibility, as the participants needed to be in the lab where the experiment was being held. In light 
of the mentioned limitations, the current study aims to bridge these limitations by developing a VR model that 
considers all of the implementation phases of modular bridge construction to train practitioners on different OSC 
processes and connect various project teams. The objectives of this study can be summarized as follows: 1) review 
the literature contributions in VR-OSC research; 2) develop a framework of a VR model considering the fabrication, 
installation, and transportation phases of modular bridge construction; 3) test the developed framework on an 
actual case study on a modular bridge and gather feedback from participants for improvement; 4) develop a cloud 
platform to support remote access to the model. The rest of the paper is organized as follows: Section Two 
summarizes the methodology outline, Section Three discusses the model development in the mentioned phases, 
Section Four discusses the analysis of the model and provides the areas of improvement, and finally, Section Five 
concludes the current study. 


2. METHODOLOGY SECTION 


This section demonstrates the methodology and the tools used in this study. Figure 1 outlines the study framework. 
As mentioned, the proposed framework aims to provide engaging and efficient training for various OSC processes, 
improve stakeholders’ connectivity and coordination, and assist participants in getting familiar with OSC processes. 
The proposed framework targets many project teams, such as the factory team, onsite assembly team, and 
transportation team. The developed model in the game engine is further upgraded to be accessible online for project 
teams. 


The framework combines the merits of BIM modeling and game engines. The framework starts with building a 
3D BIM model of the modular bridge construction project. The 3D model was developed based on the statistical 
system of an example of a modular bridge addressed by Xiangmin and Dewei (2021). Autodesk Revit is used in 
this step to develop the BIM model. An FBX format is used to export the developed BIM model, with all of the 
needed information and real dimensions, to the game engine tool. In this study, Unity3D is used as the game engine 
tool. Unity3D has been used by research scholars in developing serious games and VR scenes due to its efficiency 
and compatibility with technologies, such as BIM (Zhang & Pan, 2021) and hardware sensors (Jeon & Cai, 2021). 
After the FBX model is imported to the Unity3D platform, the interactions in the virtual scenes are added using 
C# programming language. The developed virtual scenes include the following: an exploring scene, a 
transportation scene, a yard scene, and an onsite scene. The scenes will be further detailed in the following section. 
To facilitate easy access to the developed VR model and leverage the use of the VR model, web cross reality 
(WebXR) and Web Graphics Library (WebGL) are used. WebXR facilitates the ability to develop fully immersive 
virtual models across the web using several types of hardware (Bao et al., 2022). A modular construction case 
study is also discussed to test the proposed methodology, and participants are then asked to evaluate the designed 
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tool, and the feedback is used to improve the development of the model. The following sections will detail the 
development of each of the discussed steps. 
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model 
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Tool evaluation 


Conclusion and 
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Fig. 1: The study framework 


Furthermore, the model testing was performed through two main steps. First, a non-immersive environment is 
created where the participant can enter any of the developed scenes and test the efficiency of the developed model. 
This option is valuable when deploying the developed virtual model on the shared cloud. This arrangement allows 
participants to access the model without the need to have a VR headset and controller. In the second step, the 
model is then tested in a fully immersive environment. The VR headset and controllers are connected to the virtual 
model in the Unity3D platform. The controlling buttons are then edited and added to the VR controllers. 
Participants in the fully immersive model can perform the same tasks using the VR controller. Participants are 
asked to evaluate the model after the experiment and to provide possible areas of improvement. 


Besides the software tools used in the study, a number of hardware tools are also used. The installation of the VR 
headsets and controllers is shown in Figure 2. The hardware used in this experiment includes the following: 1) 


48 


SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


HTC VIVE headset, 2) HTC controllers, and 3) two bases. All of these hardware pieces were connected to the PC 
and added to the Unity 3D game engine. 


2- HTC Vive 
Controllers 


3- HTC Vive Base 4- Desktop 
Stations Computer 


Fig. 2: Hardware used in the VR experience 


3. MODEL DEVELOPMENT 


In this section, the development of the model is discussed. As mentioned above, Autodesk Revit is used to develop 
the BIM model, Unity3D is used to develop the virtual interactive scenes, and WebXR is used as the web platform 
of the model. A case study of a modular bridge is discussed in this section to demonstrate the proposed 
methodology. It is worth noting that the location, details, and drawings of the case study are remained anonymous. 


3.1. BIM model 


This section shows the development of the BIM model. The chosen case study comprises the following OSC 
elements: prefabricated steel modules and precast reinforced concrete slabs. Figure 3 shows the details of the 
developed BIM model. The shown steel modules are first installed on the bridge, and then the precast slabs are 
installed on top of the steel modules. The fabrication of the steel modules is beyond the scope of this study. 
However, handling the steel modules at the factory is included in the developed scenes. 


Precast slabs 


Fig. 3: Developed 3D BIM model. 
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3.2. Virtual Scenes 


This section discusses the development of virtual scenes in this study. The section includes the following scenes: 
the exploring scene, the yard scene, the onsite scene, and the transportation scene. Further, the graphical user 
interface (GUID) is used to ease the accessibility of the developed scenes. Figure 4 shows the developed GUI. The 
user can easily navigate among different choices, including scene selection, scene instructions, and scene options 
(i.e., audio adjustment). 
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Fig. 4: GUI of the developed game scenes. 
3.2.1. Explore Scene 


In the Explore Scene, the participants can walk around the VR model in a fully immersive environment. Figure 5 
includes a snapshot of the developed scene. The use of this scene for participants is to get familiar with the project 
before the actual implementation. In this scene, the location of heavy equipment, material storage, special 
connections, and safety precautions are included. The participants control a character that moves in the virtual 
environment to explore the mentioned attributes. Participants can choose different camera views, such as first- 
person or third-person views. All of the elements in the virtual environment, along with the moving characters, are 
equipped with colliders, making the experience more realistic. Further, the scene is equipped with sound effects, 
such as footsteps, to increase the engagement of the participants in the virtual environment. 
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\ 


Fig. 5: Exploring scene of the developed model. 


3.2.2. Transportation Scene 


This scene includes the transportation of the precast slabs to the pickup point. Figure 6 shows a snapshot of the 
scene. The scene includes the following: a signaler character to guide the truck, signs for the safe entry of the truck, 
barriers that separate that truck's movement from other vehicles and elements on the job site, and vehicles moving 
in opposite directions to make the experience more realistic. Furthermore, the sounds of the moving vehicles are 
also added to the scene for an engaging experience for the participants. In this scene, participants (in this case, 
truck operators) are expected to drive the truck from the first location to the pickup location. The truck should be 
able to lift the slabs and assemble them on top of the steel module in the next scene. The scene also shows the 
clearance of the truck and surrounding elements. The scene then alerts the participant using a beep sound when 
the clearance is below a certain value. This is achieved through the physical attributes of the elements and collider 


functions in Unity3D. 
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Fig. 6: Snapshot of the transportation scene 


3.2.3. Onsite Scene 


The onsite scene includes the assembly of precast slabs on the steel module. Figure 7 shows a snapshot of the 
scene. The scene includes heavy equipment, such as a mobile crane, and heavy vehicles, such as trucks. The scene 
includes many tasks for the participants, such as moving the mobile crane, manipulating the crane boom, tying the 
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module, lifting the precast element, and assembling. The participant (crane operator in this scene) is expected to 
install the precast slab in the correct location. To make it easier for the participant, an assistant is provided for 
accurate precast element installation. For instance, a clearance check algorithm is provided so that the participant 
would know the distance between the element and the nearest surrounding elements. The participant will be 
notified when this distance goes below a certain value. Furthermore, participants can also enable an option that 
assists them with the installation process by indicating the projection area on the below elements. 


Distance 
: 2.30 


ah 
Fig. 7: Snapshot from the onsite scene 


3.2.4. Yard Scene 


Similar to the onsite scene, this scene targets the handling of the modules in the factory. It is worth noting that in 
the displayed case study, the factory is represented as a yard where the steel modules were being manufactured. 
Figure 8 depicts a snapshot of the factory scene. The scene shows how are the steel modules arranged on the yard. 
The participant is expected to do the same tasks included in the previous scene, but this time with an aim to place 
the steel module on top of the barge, which later will be shipped to the bridge to be installed. The participant can 
hook the module and lift it to be aligned on top of the barge. The model provides Various guidelines, such as alerts 
and the location of the installation. The participant can choose between various points of views, such as the operator 
view and the barge view. Sounds of the crane engines are also provided to engage the participants in the experiment. 


Fig. 8: Yard scene of the virtual model 
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4. RESULTS AND DISCUSSION 


This section discusses the testing of the developed model. A group of researchers from the University of Alberta, 
where this study was conducted, were invited to test out the developed tool. The testing was performed in both 
non-immersive and fully immersive environments. In the non-immersive environment, the participant was 
provided with a demonstration of the keyboard buttons, as shown in Figure 9. The controlling buttons can be 
adjusted to fit the user preferences. All of the scenes controlling buttons are added in the instructions tab of the 
developed GUI. The participant should review the controlling button prior to the start of the experiment. On the 
other hand, in the fully immersive experience, a number of hardware pieces are used, including the VR headset 
and controllers. As mentioned before, the installation of the hardware is shown in Figure 2. All of the used hardware 
pieces were connected to the PC and added to the Unity 3D game engine. Further, in the Unity 3D platform, a few 
add-ons were installed to enable the VR play mode. These add-ons are the XR plugin package and the SteamVR 
package. The XR plugin package converts the game scene in Unity 3D from regular mode to immersive VR mode 
after installing the VR headset. In addition, the SteamVR package supports many functions throughout the game 
scenes, such as teleporting and picking/dropping functions. 
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Fig. 9: Instructions of controlling buttons in the developed scenes 


Participants were asked to perform tasks in the four discussed scenes. It is worth mentioning that the model was 
tested using a small sample of VR researchers. The next step in future research is to evaluate the model based on 
construction and manufacturing professionals' feedback, i.e., workers, truck operators, and crane operators. The 
tasks that each of the participants needed to complete can be summarized as follows: 1) in the exploring scene, the 
participant was asked to tour the construction site and identify several elements in the modular bridge, including 
steel girders, modules, precast slabs, crane locations, and stayed cables. In the immersive reality, the participant 
controls the movement using the VR controllers and can also teleport easily to different positions with the 
assistance of teleporting areas and points. When pressing on the VR controller and aiming for a teleporting point, 
a light is generated in the virtual environment to help the participant navigate to the selected point. This is followed 
by a list of questions to ensure the participant's understanding of the construction site and factory. The questions 
range from basic to high-level and detailed questions; 2) in the transportation scene, the participant is asked to 
drive the truck safely from the starting point to the pickup location. The truck path is determined by barriers and 
supported by signals, i.e., slow speed signs. Further, participants are also assisted by animated characters, i.e., 
signalers, that direct them in the right direction. In addition, the participant is also asked to park the truck next to 
the mobile crane in the specified location; 3) in the onsite scene, several tasks are asked to be performed by the 
participant. The participant is asked first to turn on the mobile crane engine, move the crane boom, and lower the 
hook towards the precast slab. Following that, the participant is asked to lift the slab and move to its assembly 
location. Finally, the participant is asked to install (drop) the slab on top of the steel module; 4) in the yard scene, 
the participant is asked to perform similar tasks to the previous scene. However, in this scene, the tasks are more 
complex because of the size of the steel module, which requires more accuracy and precision. The participants are 
asked for their feedback on the developed model after completing all of the mentioned tasks. 
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The time spent by the participants in completing the mentioned tasks was 30 minutes on average. Participants 
reported the high level of immersion experienced and the high level of detail in the virtual environment. It was 
also reported that sound effects and guidance in the virtual environment have helped participants to be more 
engaged in the experiment. The participants also reported that the tasks required were clear in the virtual 
environment and displayed GUI, i.e., clearance check, and module installation assistance facilitated safe and 
accurate installation. However, participants also highlighted several areas of improvement: 1) the time of the 
immersive VR was quite long, and participants reported a low level of focus in the last 10 minutes of the 
experiment (last scene); 2) some of the controlling buttons were hard to follow, especially in the crane scene where 
the participant needed to control the crane movement, boom rotating, hook movement, and module lifting; 3) the 
placement of the module was found hard by most participants as it included movement in tight spaces. The 
collected feedback from participants is used to improve the performance of the developed model. This is part of a 
continuous study that is currently conducted by the authors. It is also worth noting that the multi-user environment, 
which enables connections between stakeholders in the VR model, requires further testing by the participants. The 
cloud application of the model using WebGL and WebXR was also tested. The WebGL plugin was supported by 
Unity3D. The function of this tool was to build the game scenes in a format that could be deployed online. Figure 
10 shows a snapshot of the cloud model. This cloud model was tested across different platforms to ensure its 
functionality. Furthermore, the model was also supported by the WebXR plugin. This allowed the participants to 
experience the developed scenes in a fully immersive manner. These tools provide accessibility to the model to 
multiple stakeholders. The scenes can be easily updated by the designer based on stakeholders' feedback on the 
cloud platform. It is also worth noting that the authors tested a few scenes of the model due to size restrictions. 
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Fig. 10: Cloud platform for the developed model 


5. CONCLUSION 


This research was motivated by the lack of VR-OSC research when dealing with complex structures, such as 
modular bridges. Hence, this study contributes by developing a VR model to train participants on the processes 
included in the construction of modular bridges. The model is supported by a GUI that facilitates the selection of 
the scene of interest. This includes exploring, yard, transportation, and onsite scenes. The exploring scene allows 
the participant to walk around the model and explore several elements and connections. In addition, the exploring 
scene is supported by the steamVR feature that allows participants to navigate among different positions in the 
scene by pointing the VR controller in the virtual environment to the desired location. In the transportation scene, 
the participant can maneuver the truck according to a defined path to the pickup point following the road signals 
and signaler characters. This is supported by sound effects to engage the participant in the VR experience. In the 
onsite scene, the participant can control the mobile crane to hook the precast slab and place it on top of the steel 
module. This is also supported by crane and truck sound effects to engage the participant in the experiment. 
Furthermore, a clearance check between the hooked precast slab and surrounding objects was added to the model. 
The displayed clearance distance color is turned into red, and a beep sound is played when it goes below threshold 
to alert the participant. In the yard scene, the participant can control the crane to hook and place the module on the 
barge. An assistance projection area is displayed to help the participant in placing the module in the correct location 
on the barge. 


The results show the capability of the model to immerse participants in the VR environment. Furthermore, they 
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also showed participants’ understanding of the processes through hands-on experience. However, the feedback 
from the participants showed a lot of areas of improvement, including the duration of the experience and the 
complexity of controlling the crane in the crane scenes. These areas of improvement are considered future research 
directions for the current study. The study provides theoretical and practical contributions. Theoretically, the study 
is considered a basis for a collaborative VR model that can facilitate process planning in OSC projects. Practically, 
the study provides an advanced VR model that can be used to connect and train OSC practitioners on various 
processes. Despite the contributions provided by the study, several limitations are included. The sample of 
participants is considered small compared to the scale of the study. However, this paper is considered a first step 
in continuous research that explores VR in OSC projects. Furthermore, the study theoretically explores the 
potential of the WebXR feature that allows easy access to the model. Future studies will be conducted to create 
access control for the developed model on the WebXR. In addition, the next step in this research will target the use 
of wearable sensors, such as eye-tracking sensors, to mitigate subjective feedback by the participants. 
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ABSTRACT. Web platforms are increasingly being used to connect communities, including construction industry 
and academia. Design features of such platforms could impose excessive cognitive workload thereby impacting 
the use of the platform. This is a crucial consideration especially for new web platforms to secure users’ interest 
in continuous usage. Understanding users’ cognitive workloads while using web platforms could help make 
necessary modifications and adapt the features to users’ preferences. Users’ usage patterns can be leveraged to 
predict the needs of users. Hence, the pattern of cognitive demand that users experience can be used to predict 
the cognitive load of web platform users. This could provide insights, generate feedback, and identify areas of 
modification that are critical for sustaining acceptability of web platforms. Using recurrent neural network, this 
study adopts electroencephalogram (EEG) data as a physiological measure of brain activity to predict brain 
signals (cognitive load) of users while interacting with a web platform designed to connect industry and academia 
for future workforce development. This paper presents a Long Short-Term Memory (LSTM) based approach to 
develop a model for predicting users’ cognitive load via EEG signals. Nineteen (19) potential end-users of the 
proposed web platform were recruited as participants in this study. The participants interacted with the web- 
platform in a real case scenario and their brain signals were captured using a five-channel EEG device. The 
validity of the proposed method was evaluated using root mean square error (RMSE), coefficient of determination 
(R°), and comparison of the predicted and actual EEG signals and mental workload. The results revealed the 
reliability of the model and provided a suitable method for predicting users brain signals while using web 
platforms. This could be leveraged to understand users’ cognitive demand which could provide insights for web 
platform improvements to engender users’ continuous usage. 


KEYWORDS: Cognitive load, electroencephalogram, industry-academia collaboration, long short-term 
memory, web platform. 


1 INTRODUCTION 


To achieve a balanced blend of theory and practice, as well as adequately prepare students for a rapidly changing 
industry like the construction sector, collaboration between industry and academia is important. Academia differs 
from the industry in that the industry is known for practical application of knowledge while academia is known 
for teaching and research. These differences are complementary in preparing the future workforce for the 
workplace. Therefore, this necessitates a connection between instructors and practitioners for collaborations in 
future workforce development. However, there are myriads of challenges plaguing these collaborations of which 
a prime challenge is instructors’ access to practitioners (Chandrasekaran, Littlefair, & Stojcevski, 2015). Since 
the outbreak of Covid-19, the internet is being increasingly used to connect individuals and communities, for 
example, to connect instructors to students, and buyers to sellers. The usage of the internet has been growing over 
the decades, with a transition from mere information sharing medium to workspaces, marketplaces, and even 
communities (H.-F. Lin, 2009; Schmutz, Heinz, Métrailler, & Opwis, 2009; Wellman, 2004). Dale, Basumatary, 
Iqbal, Khullar, and Shaikh (2022) used Facebook to connect diverse community users to archived language 
collections. Maher, Oropello, Roman, and Zeoli (2022) also showed how the internet was used to connect 
underserved communities to increase health care access and improve care outcomes. Internet-based technologies 
have also been leveraged to build virtual communities (H.-F. Lin, 2009). Therefore, a web-based platform could 
be used to connect instructors to different practitioners who are willing and able to provide complementary input 
in course offerings. Hence, a web-based platform was designed to give instructors improved access to practitioners 
who could provide complementary inputs in instructors’ pedagogical effort and support the preparation of students 
for the industry. However, during interaction with web-based platforms, there is a risk of cognitive overload. 
Cognitive overload is an indicator of non-intuitive interface, poor presentation of information which requires more 
efforts to interact with thereby exhausting cognitive resources. Therefore, to ensure that the web platform for 
connecting instructors with practitioners has little or minimal downsides, it is important to ensure it has minimal 
cognitive demand on users. 
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High cognitive load has been identified as an indication of web usability problem (Albers, 2011) but this is not 
always the case. Users’ performance and success in the use of web-based platforms depends not only on the web 
platform but also on users. This is because human cognitive resources at a point in time are limited, unstable, and 
vary from person to person. Also, the perceived cognitive demand for the same activities varies among people 
(Das, Chatterjee, & Sinha, 2013), for example due to differences in prior knowledge (Seufert, Janen, & Briinken, 
2007), users skill level (Kumar & Kumar, 2016), and amount of cognitive resources available (Tracy & Albers, 
2006). In addition, regardless of intrinsic features of a web-based platform that could impact users’ cognitive load, 
other extrinsic factors such as lack of adequate sleep, temporal demand, and stress (Tracy & Albers, 2006) could 
reduce the amount of cognitive resources available to users at a point in time. Therefore, due to these varying and 
fluctuating extrinsic factors, a user can experience different levels of cognitive demand on the same web-based 
platform at different times even when the platform is not changing. Majority of prior studies (Hewitt & He, 2022; 
F.-R. Lin & Kao, 2018; Mills et al., 2017; Schmutz et al., 2009) focused on detection of cognitive load and the 
impact of web-platforms’ intrinsic characteristics on users’ cognitive load with little or no attention on extrinsic 
factors that are user-dependent which also impact cognitive demand. This represents a major limitation to the 
generalizability of user experience on the same web platform due to the fluctuation and differences in human 
cognitive resources. This also accounts for disparities between usability evaluations and real-world scenarios 
which usually skewed the results of several user testing research. Hence, one-size-fits all approaches cannot meet 
users’ unique and differing needs. 


Therefore, to address the dynamism in web-platform usage because of the varying and unstable nature of cognitive 
resources, adaptive and personalized website design would be beneficial (Desai, 2021). To achieve this, 
(Adomavicius & Tuzhilin, 2005) recommended leveraging usage patterns to predict the needs of users. A reliable 
prediction of cognitive load is a fundamental step toward adaptive design (Appel et al., 2019). Hence, the pattern 
of cognitive demand that users experience can be used to predict the cognitive load of users. This could also help 
to generate feedback and identify areas of modification that are critical for sustaining acceptability of web 
platforms. In addition to subjective measures (e.g., NASA TLX), electroencephalogram (EEG) is a growing 
objective measure of cognitive load in human computer interaction. This has been used by previous studies 
(Caldiroli et al., 2023; Kumar & Kumar, 2016) to assess cognitive load in web-platform usage. Previous studies 
(Appel et al., 2019; Herbig et al., 2020) have focused on predicting cognitive load with other physiological 
measures (such as eye tracking metrics, heart rates, and galvanic skin response) using machine learning. Most 
previous studies (Caldiroli et al., 2023; F.-R. Lin & Kao, 2018; Mills et al., 2017) focused on using EEG to detect 
cognitive load in web platform usage. Only a few studies such as (Friedman, Fekete, Gal, & Shriki, 2019; Mills 
et al., 2017; Yoo, Kim, & Hong, 2023) used EEG for prediction of cognitive load in web platform usage. Mills et 
al. (2017) leveraged EEG spectral features using partial least squares regression to develop a model to predict 
cognitive load during interactions with an intelligent tutoring system. Yoo et al. (2023) developed a long short- 
term memory (LSTM)-based machine learning model to predict the degree of cognitive load using EEG data. The 
study showed that LSTM had the highest accuracy of 87.1% compared to random forest (64%), AdaBoost 
(64.31%), support vector machine (60.9%), XGBoost (67.3%), and artificial neural network models (71.4%). 
Using EEG data for prediction of cognitive load, Friedman et al. (2019) assessed different machine learning 
predictive models and reported that XGBoost has the highest predictive power compared to random forest, 
artificial neural network, and simple linear regression models. Therefore, if a web platform is held constant over 
time, users' cognitive demand can be predicted with EEG signals as they interact with the platform. Hence, this 
study leverages EEG signals to develop a model for predicting the cognitive demand of a web platform designed 
for industry-academia collaborations. The results of predicting users’ cognitive load could help identify patterns 
in the usage of the platform which could inform necessary modifications to ensure optimum usability that could 
influence users’ acceptance and intention to use the proposed web-based platform 


2 BACKGROUND 


The success of new information systems hinged on users' acceptance (Davis, 1985). However, high cognitive load 
could affect user’s satisfaction as well as acceptance of a new web-platform. For example, high cognitive load is 
an indication of web usability problems (Albers, 2011). Hu, Hu, and Fang (2017) demonstrated that cognitive 
load can affect user satisfaction with a website. This could affect users’ revisit, trust, and loyalty (Desai, 2021). 
Hewitt and He (2022) showed that difficulty of task to be performed and web page contrast could impact users’ 
cognitive demand and perceived usability. Schmutz, Roth, Seckler, and Opwis (2010) revealed that mode of 
presentation of information on web platforms impacts users’ perceived cognitive load. Examples of other 
problems associated with web-based platforms which could affect the cognitive load of users include confusing 
link name or description, horizontal scrolling, and atypical interface design which negate users’ mental model 
(Albers, 2011). The cognitive demand of web-based platforms is crucial because human cognitive resources are 
limited. Hence, there is a risk of web-platforms requiring more cognitive resources than what users possess, which 
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results in cognitive overload (Albers, 2011). Despite design principles, users of web platforms could be 
overwhelmed or confused because of information overload and/or excessive obstacles to overcome before locating 
the right information. Cognitive overload could interfere with mental processing of information which could cause 
users to exit a web page or even fail to locate appropriate content (Albers, 2011). Other manifestations/impact of 
cognitive overload on web users include task shedding, increase in frustration, multiple mistakes, lack of attention 
to detail and disregard of content (Albers, 2011; Kumar & Kumar, 2016). As cognitive demand increases, users’ 
performance plummets (Tracy & Albers, 2006). Hence, the need to assess cognitive load of users as they interact 
with web-based platforms. 


Though originated from psychology, assessment of cognitive load has translated into physiological sensing where 
objective measures such as EEG are increasingly being used to complement subjective measures (Kumar & 
Kumar, 2016). The limitations of subjective measures (such as bias and inability to currently recall actual 
experience or perception) make objective measures (e.g., EEG) growing methods for assessing cognitive load. 
Through electrodes on the scalp, EEG collects brain signals resulting from cognitive processes taking place in the 
brain (Kumar & Kumar, 2016). These signals vary depending on the type of activities in the brain and correspond 
to cognitive load (Mills et al., 2017). By leveraging deep learning techniques, EEG signals can be used to predict 
cognitive load via real time data from brain signals. Prior studies have demonstrated the efficacy of EEG to predict 
the cognitive load in different contexts (Moghar & Hamiche, 2020; Salman, Heryadi, Abdurahman, & Suparta, 
2018; Yoo et al., 2023) using Recurrent Neural Networks (RNN). RNN are deep learning techniques commonly 
used for time series forecasting of sequential data (Qin & Bulbul, 2023). However, major downsides of RNN 
include the time intensive nature of traditional RNN and difficulty in training the models because they are prone 
to vanishing and exploding gradient problems (Van Houdt, Mosquera, & Napoles, 2020). To circumvent this 
challenge, advanced architectures like LSTM are being used in diverse contexts to develop prediction models for 
time series. For example, in prediction of mental workload during construction task using augmented reality head 
mounted display (Qin & Bulbul, 2023), stock market prediction (Moghar & Hamiche, 2020), and weather 
forecasting (Salman et al., 2018). LSTM consists of input layer, output layer and an intermediary LSTM layer (or 
hidden layer) (Moghar & Hamiche, 2020). The input layer receives data as input, while the output layer determines 
data that will be output. The hidden layer is made up of memory cells and three gates that are in charge of updating 
the cell state. LSTM is a gradient-based method used for capturing long-term dependencies in sequential data 
(Hua et al., 2019). The primary component of LSTM that enabled this capability of LSTM is the memory block 
(Van Houdt et al., 2020). Memory block (or LSTM cell) is a subnetwork comprising a memory cell (also known 
as cell state) and three gates (namely, input gate, output gate and forget gate) (Staudemeyer & Morris, 2019). The 
memory cell retains the temporal state of the neural network while the gates control the flow of information. The 
input gate manages the inflow of new information into the memory cell using Equation 2 and 3 and updates the 
memory cell by Equation 4. The amount of existing information which remains in the current memory cell is 
controlled by the forget gate as illustrated in Equation 1. The output gate regulates the amount of information for 
computing the output activation of the memory block and how it propagates to the rest of the neural network (Hua 
et al., 2019) using equations 5 and 6. The structure of the LSTM cell is shown in Figure 1. 


f = o(We [he-1, x] + be ) ..Egqn. 1 
it = O(Wifht-1, Xt] + bi) ..Eqn. 2 
ce = tanh(W.[hte1, xt] + be) ..Egqn. 3 
a= ft O ca tit O ct .. Eqn. 4 
ot = o(W,[hit, Xt] + bo) ..Eqn. 5 
h: = ot © tanh(c:) .. Eqn. 6 


Weight matrices for the forget gate, input gate, cell state and output gate are denoted by Ws, Wi, We, Wo. In the 
same order, bs, bi, bc, bo represent the bias vectors. Elementwise (Hadamard) multiplication is denoted by ©, 
logistic sigmoid function by o, and the hyperbolic tangent function by tanh. h; and c: represent the hidden state 
and cell state at time t respectively. 
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Fig. 1: Architecture of LSTM network. 
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Mental workload can be estimated from spectral power which represents the brain rhythm’s energy (Qin & Bulbul, 
2023). A positive relationship existed between cognitive load and theta rhythm power, whereas a negative 
relationship exist between cognitive load and alpha rhythm power (Gevins, Smith, McEvoy, & Yu, 1997). Hence 
mental workload can be calculated using equation 7 similar to Qin and Bulbul (2023). 

MW(t) = ar 5 Eqn. 7 
MW(t) represents the mental workload at time t; f(t) and ap(t) are the mean spectral power of theta and alpha 
rhythm at time t respectively. 


3 METHODOLOGY 
3.1 Overview of the Web Platform 


The web platform in this study is designed to be a collaborative network of instructors and practitioners for future 
workforce development. The aim of the platform is to improve instructors’ access to practitioners who could 
provide practical supplementary inputs in construction engineering education to aid students’ preparedness for the 
industry. The potential users of the platform are instructors in construction-related programs (such as Building 
Construction, Architecture, Civil and Environmental Engineering as well as Construction Engineering and 
Management) and construction industry professionals. The platform was designed by leveraging participatory 
design, interaction design and user-centered-design principles (Freire, Arezes, & Campos, 2012). Users’ input and 
participation in the design process were ensured through usage research. Usage research was used to elicit 
pertinent information from end users. The information elicited served as inputs for the design of optimal graphic 
user interface of the platform. By leveraging heuristics design principles for user interface design (Nielsen, 1994), 
the platform was designed to be typical to other platforms that potential users are familiar with. This is to ensure 
that the platform operational procedure is similar to users’ mental mode which could enhance ease of use of the 
platform as well as users' acceptance. To use the platform, an instructor is required to sign up, verify email address, 
complete profile, submit request for course-support, view recommended practitioners from the platform and select 
preferred practitioner to meet the course-support request. The course-support requests include site visits, guest 
lectures, seminars, workshops, and other activities that allow students to interact with practitioners under the 
guidance of an instructor. The platform was designed using JavaScript programming language. A relational 
database management system (MariaDB) was adopted with Node.js as server. 


3.2 Experimental Design 


After a brief introduction of the platform to participants. The procedure of the experiment was explained. All 
participants provided their informed consent by signing the consent form. The participants interacted with the 
web-based platform. Each participant was required to sign up on the platform. Thereafter, participants verified 
their email address before first login. Upon login, the participants were required to complete their profile after 
which they requested a course-support from practitioners. After a request for course-support, participants viewed 
recommended practitioners to meet their course support request. Out of these recommendations, instructors made 
a selection. Every session of the experiment was conducted under similar conditions. 


3.3 Participant and Study Approval 


Nineteen (19) participants were recruited after the research protocol was approved by the Virginia Tech 
Institutional Review Board. The participants include both male and female professors (the proposed end-user of 
the web-based platform) with varying degrees of experience, different job titles and from diverse construction- 
related academic programs such as civil and environmental engineering, building construction, architecture, and 
construction engineering and management. 


3.4 Data Collection 


As participants use the web-based platform, their cognitive load was objectively measured via braai signals using 
an electroencephalogram (EEG) device called EMOTIV Insight. EMOTIV Insight has five channels, namely AF3, 
AF4, T7, T8, Pz with semi-dry polymer sensors and two reference sensors (CMS and DRL). The channels are 
arranged according to the 10/20 international EEG system. EMOTIV Insight has a sampling rate of 128 samples 
per second per channel for EEG signal with frequency response of 0.5-43Hz, digital notch filters at 50Hz and 
60Hz. The device has Bluetooth connectivity which can be connected to a computer or mobile device with 
Bluetooth V5.0. EMOTIV Insight provides coverage of the frontal, temporal and parietal lobes which are 
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associated with cognitive effort (Kumar & Kumar, 2016). The EEG recording was about ten (10) minutes on 
average per participant. An overview of the methodology is shown in Figure 2. 


Data Preprocessing 
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Electrode shifting, electrical line noie Eye bink, muscle activity, beart rate 


Data Collection 


A participant interacting with the web-pbeform 


Pou tea of EEG Emotr inaght 
Channels and References EEG Derse) 


EEG Signal Prediction Using LSTM 


LSTM Model Structure EEG Signal Prediction Across 5 Channels 


80% for Training 
Y 20% for Testing 


LSTM layer 


Y 


layer 


Y 


Fig. 2: Overview of methodology. 


3.5 Data Preprocessing and Analysis 


The raw EEG data collected with the EmotivPRO app was cleaned. Thereafter, the processes highlighted below 
were carried out. 


3.5.1 Artifacts removal 


EEG signals are susceptible to diverse categories of artifacts which represent noise/interferences to signals of 
interest. These artifacts are either intrinsic or extrinsic. Intrinsic artifacts are generated by EEG user’s body 
movement such as blinking and muscle activity. Extrinsic artifacts originate from external factors such as shifting 
of electrode, noise from electrode wiring and surroundings noise (Jebelli, Hwang, & Lee, 2018). According to 
Urigüen and Garcia-Zapirain (2015), these artifacts are usually small when the EEG device is used in a somewhat 
stationary position as it was in this study. Both intrinsic and extrinsic artifact removals were done using EEGLAB. 
The cleaned EEG data in CSV format were converted to MATLAB file and imported into EEGLAB. The data 
was mapped and structured using 5-channel location. The extrinsic artifacts were removed using basic band pass 
filter range of 0.5Hz to 60Hz. As recommended by Delorme and Makeig (2004), Extended Infomax method was 
used to decompose the EEG data through independent component analysis (ICA). The data was decomposed into 
5 components, displayed with scalp heat maps and intrinsic artifacts were rejected. 


3.5.2 Data Processing 


Five (5) brain wave frequency bands were captured by each of the five (5) electrodes of the EEG device (EMOTIV 
Insight) used in this study. These frequency bands include Theta (4-8Hz), Alpha (8-12Hz), Low Beta (12-16Hz), 
High Beta (16-25Hz), Gamma (25-45Hz). The cleaned data for all the nineteen participants from the five (5) 
channels of the EEG device were used for the analysis. There were 79936 data points on average for each 
participant for an average recording time of 10 minutes. The data points were split into 80% and 20% for training 
and testing respectively. 


3.5.3 Prediction framework 
The preprocessed EEG data was used to train the LSTM network for prediction of EEG signals. Open loop 
forecasting was adopted because true values of brain signals (representing cognitive load) from EEG were used 


to train the LSTM network for prediction. Similar to Kingma & Ba (2014), Adaptive Moment Estimation which 
is an extension of the stochastic gradient descent algorithm was used for optimization with a learning rate of 0.001. 
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To ensure that the loss is as small as possible an epoch of 250 was adopted for training the model. Root Mean 
Square Error (RMSE) was used to calculate the loss function to determine the performance of the model. The 
LSTM layer has 128 hidden units or memory cells to capture and store information over time, which enables the 
LSTM network to process sequential data effectively. The hidden units determine the amount of information 
learned by the layer. Both the sequence input layer and the fully connected layer of the LSTM regression neural 
network have sizes that match the number of channels of the input data. 


3.5.4 Mental workload 


Because it is not possible to directly measure mental workload from EEG signals, the signals were converted into 
frequency domain. This conversion enabled the calculation of the average spectral power of particular brain 
thythms, hence, the Power Spectral Density (PSD) of the signal was calculated. PSD is a measure of the mean 
power distribution of a signal over a specific timeframe with the unit showing energy per frequency (Qin & Bulbul, 
2023). The mental workload was estimated using equation 7 for both the actual and predicted EEG signals. 


4 RESULTS AND DISCUSSION 


4.1 Performance Evaluation 


The performance of the predictive model was evaluated using RMSE. The RMSE shows the difference between 
the predicted and actual values of the EEG signals. Table 1 shows the RMSE for all the test participants. The 
average of the RMSE was 0.0674. The RMSE of the test participants' datasets were very low (<0.037) except for 
the third participants whose RMSE was 0.1607 which skewed the average of the RMSE to 0.0674. However, the 
low RMSE of the other test participants’ datasets reveal the high predictive power of the LSTM model by 
indicating marginal difference between the actual and predicted EEG signals. This agrees with Miyamoto, Tanaka, 
and Nakamura (2022) who posited that the closer RMSE is to zero the better. The high RMSE of the third 
participant in the test dataset could be attributed to insufficient data points. All other participants had more than 
74,000 data points while the third participant had about 58,500 data points amounting to a difference of 16,000 
data points. Also, the EEG recording time of the participant was very short and fell below the average duration. 
This agrees with Pyo et al. (2018) who opined that low RMSE might be because of insufficient data points. In 
addition, although 58,500 data points seem considerably high, this result reveals that prediction models require 
large amounts of data for accurate forecasting. This position is also supported by Ettinger et al. (2021), even 
though there are no fixed number of data points required for predictive models. However, considering other factors 
such as complexity of problem, desired performance and complexity of model, this result could provide a guide 
for future research. 


Table 1: RMSE for test participants. 


Test Participants 1 2 3 4 


RMSE 0.0356 0.0367 0.1607 0.0367 


The performance of the LSTM prediction model was further assessed as shown in Figure 3 by comparing the 
predicted and actual EEG signals of the test dataset for the five (5) EEG channels. The comparison reveals that 
the predicted EEG signals follow a very similar pattern as the actual EEG signals. Although there were minor 
deviations where the path of the predicted signals did not align with the actual EEG signals, to a large extent, the 
model was able to accurately predict sudden and subtle fluctuations. However, it appears that the model was able 
to predict subtle fluctuations better than sudden drastic changes in the EEG signals. Overall, the predictive model 
can be adjudged reliable especially in predicting subtle fluctuations in the EEG signal. To further show the 
performance (validity) of the predictive model, scatter plot was used to plot the predicted values and the actual 
values of the test data set for each EEG channel (see Figure 4). 
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Fig. 3: Comparison of predicted and actual EEG signals for the test dataset across the five EEG channels. 


The scatter plots in Figure 4 and the R? in Table 2 show that the model was able to explain a significant proportion 
of the variability in the actual EEG values. R? is the coefficient of determination which indicates the goodness-of- 
fit of the regression model. Given the low RMSE for the test participants and the high R? for the channels, it is 
evident that the model captured the underlying pattern in the data effectively, and the predicted values are very 
close to the actual values. The model could therefore be considered accurate because as Alexander, Tropsha, and 
Winkler (2015) explained, RMSE is a useful indicator of a model's practical value. The high R? values show that 
a significant portion of the underlying patterns and relationships in the actual data is accounted for by the 
predictions made by the LSTM model. This is because according to Chicco, Warrens, and Jurman (2021), RMSE 
is a measure of the average errors between predicted values and actual values while R? explains the amount of 
variance in the data that the model could explain. Hence, the overall value of a model has been defined by its 
accuracy and precision and as well as by its effectiveness in elucidating the variability in datasets (Coulibaly & 
Baldwin, 2005; Qin & Bulbul, 2023). Also, given that the low RMSE values were for the test participants while 
the high R? values were for the EEG channels, it is shown that on the overall for a participant, the model was able 
to achieve little error between the actual EEG signals and the predicted EEG signals, and for each EEG channel, 
the model was able to explain a significant portion of the variability in the data for prediction. Hence, the model 
is able to give reliable prediction of participants’ EEG signals. 


Table 2: R? for each channel. 
Channel AF3 T7 PZ T8 AF4 


R? 0.9336 0.7683 0.8022 0.7541 0.8854 
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Fig. 4: Scatter plots showing the predicted and actual values for the 5 EEG channels. 


As shown in Table 2, the R? values are > 0.7541. Channel AF3 has the highest R? value, this is followed by channel 
AF4, PZ, T7 and T8 respectively. The scatter plots show the linear relationship between the predicted and actual 
EEG signals. Although as shown in Figure 4, there are few data points that deviated from the linear relationship 
in each EEG channel, a great proportion of both the predicted and actual EEG values fit into the linear relationship. 
For example, the lowest R? value is 0.7541 shows that about 75.41% of variance in the actual EEG signals is 
accounted for by the predicted signals. According to Coulibaly and Baldwin (2005), R? values in the range of 0.8 
- 0.9 are considered acceptable and those > 0.90 are considered very satisfactory. Only three EEG channels (AF4, 
PZ and AF4) fall within this range. However, as revealed by Alexander et al. (2015), RMSE is a more informative 
indicator of a model's usefulness compared to R?. This is because, the value of a model should be based on its 
accuracy and precision and not on its explanatory power of variability in a particular data set (Alexander et al., 
2015). Chicco et al. (2021) also noted that R? value can be quite low even when dealing with a fully linear model, 
and the opposite is also true. Therefore, overall, the results show that brain activity of users using a web-based 
platform can be reliably predicted with EEG signals. 


4.2 Mental Workload 


Figure 5 shows the scatter plot of the predicted mental workload plotted against the actual mental workload. 
According to Coulibaly and Baldwin (2005), the R? value (> 0.90) was very satisfactory. This shows that the 
predicted mental workload matches the actual mental workload which further reinforces the efficacy of the LSTM 
model to learn and predict the cognitive load of users during industry industry-academia collaboration via a web 
platform. The results reveal that 92.50% of the variance in the actual mental workload can be explained by the 
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predicted mental workload. This provides a reliable prediction of mental workload of users in industry-academia 
collaboration via a web platform. This potential of LSTM model to predicted cognitive load has been corroborated 
by earlier studies such as Salman et al. (2018) and Qin and Bulbul (2023). 
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Fig. 5: Relationship between actual and predicted mental workload. 


5 CONCLUSION, LIMITATIONS AND FUTURE WORK 


Cognitive load is a major consideration in the design and usage of user interfaces because it could influence users’ 
attitude towards the web platform as well as continual usage. Through web platforms, the internet is being 
leveraged to connect instructors in construction-related programs with construction industry practitioners who 
could support their pedagogical efforts in preparing students for the workplace. Using LSTM, this study assessed 
the effectiveness of EEG-based prediction of brain signals (representing cognitive load) as instructors interact 
with the web platform designed to connect them with practitioners. The results demonstrated the accuracy and 
reliability of the LSTM model to predict EEG signals as users interact with the web platform. The model was able 
to predict subtle fluctuations better than sudden drastic changes in the EEG signals. The results showed low RMSE 
and high R? values which indicate that the model’s predictions are close to the actual values, and it is explaining 
much of the variability in the data. The efficacy of the model to predict EEG signals could be leveraged to 
understand users’ pattern of cognitive demand in human-computer interaction. This pattern of users’ cognitive 
demand could provide a better understanding of the cognitive resources expended by users as they interact with 
the web platform. This is critical because users’ cognitive resources and cognitive demand varies due to both 
intrinsic and extrinsic factors hence a one-time detection of cognitive load might not provide adequate insights. 
The prediction of EEG signals could be used to understand users’ usage patterns and necessary modifications 
required to enhance interface functionality, navigation, content integration as well as user experience. This is 
crucial for new web platforms which users are unfamiliar with and which could operate differently from their 
mental model. Also, the process of users’ acclimatization with the platform as well as the impact of learning curve 
in using the web platform could be better understood through the prediction model. The study has some limitations 
which should be acknowledged. Although the sample size is adjudged adequate, using a higher sample size could 
yield better results. Also, LSTM was used in this study, future work could focus on using different network models 
for comparison of accuracy and reliability. Future work could likewise explore achieving lower RMSE and higher 
R’. 
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TRANSITIONING FROM 2D TO VR IN DESIGN REVIEW - 
RESISTANCE TO ENGAGEMENT 


Shahin Sateei, Mattias Roupé & Mikael Johansson 
Chalmers University of Technology, Sweden 


ABSTRACT: Although immersive virtual reality (VR) has been shown to facilitate collaborative understanding of 
a design, many users remain resistant to its use. Moreover, there is currently a lack of real-world studies 
investigating why certain users (e.g., architects) are resistant to use VR during design reviews. The aim of this 
study is to understand the resistance that influence client representatives’ and architects’ interaction with a VR- 
system that supports both fully- and non-immersive experiences of the virtual environment. Data were gathered 
from three VR-workshops, which were part of 3 design review sessions of a new elementary school. Additional 
data were gathered from four semi-structured interviews with both the architects and client representatives 
participating in all workshop sessions, the interior architect involved in the project as well as an additional six 
semi-structured interviews. These additional six interviews involved exterior architects from different firms, who 
had previously used VR for both informative and design review purposes. The findings suggest that client 
representatives and the architects had initially been resistant to use VR during the design reviews, but their 
attitudes changed progressively during the three workshops, in particular that of the architects. The findings also 
indicate that interactive features in VR (e.g., object manipulation, multi-user) help end users negotiate design 
requests more efficiently and make informed decision-making. This paper highlights how immersive VR could 
improve the design review process. 


KEYWORDS: Virtual Reality, HMD VR, design process, design review, spatial understanding, end-users. 


1. INTRODUCTION AND RELATED WORK 


Fully immersive Virtual Reality (VR) has emerged as a potential alternative to traditional information and 
visualization media (e.g., 2D drawings, 3D models) where users are surrounded by a virtual environment of the 
building design. This experience of users “stepping into the design” is referred to as an immersive experience 
(Castronovo et al., 2013; Johansson, 2016). When fully immersed, users can experience the design from a user- 
centric viewpoint, i.e., an ego-centric frame of reference (Paes et al., 2023), which provides clearer visual cues 
(e.g., size, shape, location) (Hermund et al., 2017). Enhanced visual cues not only help users better perceive 
volumetric qualities of the building design than when 2D drawings and 3D models are used (Chowdhury & 
Schnabel, 2020), but also help users gain a more representative understanding of the final building design (Nikolić 
& Whyte, 2021). It is important to note here that 3D models viewed on a traditional screen are considered non- 
immersive VR (Castronovo et al., 2013), e.g., when 3D models are displayed on a computer monitor, projector 
screen or a multitouch table (Dorta et al., 2016), the users are not immersed even though they do view a virtual 
environment. In contrast, immersive VR-systems provide either a semi-immersive experience (i.e., experiencing 
the virtual environment through stereoscopic displays) or a fully-immersive experience (i.e., with head mount 
display (HMD) and motion-tracking). Studies have also explored how hybrid design environments, (i.e., 
combining traditional design techniques such as sketching by hand with immersive ones like VR), influence users’ 
understanding of the design. For instance, Okeil (2010) showed how design team members, interacting with 
available 3D computerized sketching feature combined with a visual understanding enabled by a semi-immersive 
CAVE system, were able to efficiently explore and iterate on different design ideas. More specifically, by viewing 
the design that had been drawn in the non-immersive environment, design team members could immediately see 
the outcome of their design decisions in the semi-immersive environment, resulting in a more rapid cycle of testing 
and validating of different designs. 


In the context of users’ interacting with different VR-systems, interactive features in fully-immersive VR such as 
object manipulation (Wolfartsberger, 2019), multi-user (Truong et al., 2021) and multi-scale (Sateei et al., 2022) 
have shown to facilitate a mutual understanding of the design between end-users and design team members. For 
example, the ability to combine multi-user and object manipulation to use task-based scenarios during design 
review has shown to accelerate decision-making when resolving design issues. Specifically, by enabling 
furnishment and collaborative review of the virtual environment in real-time, end-users can better understand 
which layouts support building occupants’ work tasks whilst also reducing the overall lead-time of the design 
process (Roupé et al., 2020). Accordingly, task-based scenarios in VR shift design review from interpreting the 
design to understanding building occupants’ daily work tasks (Nikolić & Whyte, 2021). This understanding of 
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building occupants’ daily work tasks is then more likely to result in collaborative practices such as Co-design 
where the end-users become part of the design team (Caixeta et al., 2019; Roupé et al., 2020). 


However, although several studies have shown the benefits of using VR in the design process, few have explored 
end-users’ and design team members’ engagement in using VR for design review purposes (Maftei et al., 2018). 
For example, questions remain as to how a collaborative understanding for the design may be facilitated with VR 
when VR as an information and visualization medium, whether semi- or fully-immersive, is primarily used as a 
presentation tool by architects (Achten, 2021; Scheer, 2014). One explanation might be the lack of knowledge on 
how to use VR in workflows where 2D drawings and 3D models are used (Zaker & Coloma, 2018). Another 
explanation highlighted by the literature is that due to the lack of real-case studies, stakeholders such as architects 
are resistant to using VR due to not knowing when in the building process to use it. Specifically, there is an initial 
resistance to using VR in a project setting, and its value in understanding end-users’ design preference is often 
realized too late in the project when time constraints arise (Belek Fialho Teixeira et al., 2021). Therefore, the focus 
of this study is twofold: 1) to understand client representatives’ and architects’ resistance to engage with VR- 
systems with support for both fully- and non-immersive experiences of the building design and 2) how architects 
and client representatives collaborate to resolve design issues when VR is used in a design review context. 


1.1 Clients’ and architects’ use of VR for design review 


Studies have explored the advantages of using VR in the context of design review and from a client perspective. 
One example is that end-users such as client representatives seem to become less reliant on the design team for 
their interpretation of the design (Kim et al., 2016), which could help reduce decision-making time during design 
review (Liu et al., 2020). Another example is a study by Liu et al. (2020) where they found that semi-immersive 
VR did not only help those who have difficulties interpretating 2D drawings but also helped project members who 
had not yet seen the design to better understand it. Similarly, in an experimental study, Umair et al. (2022) observed 
that participants’ task completion time was shorter in fully immersive VR compared to 2D drawings when 
identifying design issues. Studies have also investigated challenges to wider adoption of VR. Examples of these 
challenges are clients’ lack of knowledge of how VR-based practices should be adopted (Zaker & Coloma, 2018) 
and the lack of real-life case studies that explore how decision-making that is typically done in later phases of the 
building process, could be made already in earlier design phases (e.g., concept design phase) (Nikolić & Whyte, 
2021). 


From an architect perspective, VR is seen as one of many available information and visualization mediums (Kim 
et al., 2016). Whilst drawing has been the traditional communication tool of architects (Scheer, 2014), recent years 
have seen a continuous increase in use of a 3D model-based approach when building information modelling (BIM) 
is used in project, whether it is throughout the entire construction process (Disney et al., 2022) or only limited to 
the design process (Smith, 2016). In this context of using 3D models in the design process, VR models have been 
used when extracting them from the BIM model (Johansson, 2016), resulting in architects able to showcase the 
building design in VR. Still, the literature shows that architects maintain control over decision-making in the design 
process when using VR as they have when 2D drawings are used (Scheer, 2014). An example is the use of pre- 
defined viewpoints in the VR model during architectural walkthroughs of the building design. The argument for 
using pre-defined viewpoints is that it prevents end-users from being overwhelmed with too much detail in the 
virtual environment, which ensures that end-users maintain focus on resolving intended issues during design 
review (Castronovo et al., 2013). Additionally, previous work highlights how VR challenges the hierarchical 
position of architects, who are used to predictable and controlled working methods, such as when 2D drawings are 
used (Cruickshank et al., 2013; Scheer, 2014). 


Beyond these challenges relating to both client and architects, there is a lack of studies on how use of VR can 
affect stakeholders’ acceptance of VR over time and how the interactions between architects and clients may be 
affected. While many research efforts have explored the use of VR-based design review in real-life cases, most of 
these have primarily concentrated on using VR in one-time sessions or semi-immersive use (e.g., power wall) 
rather than fully-immersive (e.g., HMD) VR-systems (Liu et al., 2020). Few have explored how the use of VR 
over several design review sessions, with the same stakeholders, influences their receptiveness or reluctance to use 
the VR-system. However, these studies do not delve into the impact of the shifting conventional roles between 
architects as “experts” and clients as “non-experts” on the design when using a VR system that supports both fully 
-immersive experiences (e.g., HMD) and non-immersive experiences (e.g., projector screen), combined with hand 
sketching. 
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2. STUDY DESIGN 


In order to better understand how client representatives and architects interpret the impact of fully immersive VR 
and how it influences them towards becoming more receptive or reluctant towards using it, two (2) real-world 
projects were evaluated. The study follows a qualitative approach with data collected by means of observations, 
video recordings and semi-structured interviews. 


2.1 Case study 


The case study was based on the design of a new elementary school in the municipality of Gothenburg, Sweden, 
which was ongoing in different phases of the design process (e.g., spatial coordination phase, technical design) 
(Ostime, 2022). The project primarily used 2D drawings and incentive to use VR came from the clients when it 
was recognized that 2D drawings and illustrated rendered still images used by the architect could not provide a 
sufficient level of spatial understanding in the client group. Background information regarding the case study have 
been analyzed based on the following criteria: 1) purpose of using VR for design review, 2) participants’ 
expectations prior and after each workshop as well as 3) outcome from having used VR in each of the workshops 
and how this influenced clients’ and architects’ stance on using VR. The case consisted of three workshop sessions 
in the following phases of the design process: preparation and brief, conceptual design stage and spatial 
coordination phase (Ostime, 2022). 


2.2 Participants 


To achieve sample representativeness, interviewees were selected based on the following criteria: 1) role in the 
design process, 2) prior experience of design reviews with traditional visual and information medias (e.g., 2D 
drawings, 3D models, physical mock-up rooms), and 3) involvement in ongoing projects for design of schools. 


Whilst the focus of this study was on the same 2 client representatives and 2 architects who participated in all three 
VR workshops, a total of 7 other participants were also interviewed. These were the interior architect connected 
to the case study as well as 6 exterior architects who all had participated in separate school projects together with 
the client representatives interviewed in this study. The projects in which these 6 exterior architects had been 
involved in, involved using HMD VR for both informative- (i.e., feedback from client not incorporated into the 
design) and design review purposes. 


It is also important to note that all participants in the studied case had prior experience of design review with 2D 
drawings. Moreover, all the client representatives in the studied case had experience of design review with 3D 
models whilst the architects had limited experience of design review with 3D models. Lastly, architects (exterior 
and interior) who had experience with VR had only used it for informative purposes (e.g., presenting the design to 
clients without incorporating any feedback). 


2.3 VR-system 


The Virtual Collaborative Design Environment (ViCoDE), a VR-system with support for fully- and non-immersive 
user-interfaces was used. It consists of several VR-headsets (e.g., Oculus Rift S kits), a multitouch table that 
facilitates collaborative design work with immediate, real-time feedback (i.e., object manipulation) (see fig 3) as 
well as a projector screen that mirrors the HMD users’ view inside the virtual environment. The multitouch table 
uses a top view to visualize the facility. Users can pan and zoom in this view using the same standard multitouch 
interaction features found in most smartphones. 


BIM-based components (e.g., loose and fixed furniture) are available via an asset library that is accessible on user- 
interface on the multitouch table, which can be added to the scene by drag-and-drop. Once added, a component 
can be repositioned, rotated, or removed, using the multitouch interface. The changes made on the multitouch table 
are then instantly updated in all the other connected user-interfaces (e.g., projector screen, HMD VR). 
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Fig. 3: ViCoDE set up with multitouch table and projector screen (left). Client representatives and architects 
design review via ViCoDE during the first workshop for project B (right). 


Moreover, the HMD VR user-interface allows users access to interactive features such as measurement and 
dimensioning with snapping, filtering and color-coding, 3D-markups, object information (BIM-properties) and 
3D-labels, section planes, miniature model (1:40 scale of the building design), multi-user and associated 
functionality (e.g., gather, goto), and BCF snapshots. During the three workshops, at least one person from the 
research team was available for supervision and providing help such as showing how to navigate with the HMD 
VR user-interface and how to use the various available interactive features. 


2.4 Data collection and analysis 
2.4.1 Interviews 


The 2 client representatives and 2 exterior architects who attended all three VR workshops were interviewed as 
well as the interior architect of the project who participated in only the first workshop session. The focus of the 
interview questions were based on assessing the interior and exterior architects’ and client representatives’ views 
on 1) expectations before each respective workshop, 2) how VR influenced the dialogue of design review and how 
different interactive features were used to resolve design issues and 3) reflection after each workshop session. 


Beyond the five interviews conducted with the 2 exterior architects, | interior architect and 2 client representatives 
involved in the project, an additional eight semi-structured interviews were also conducted with architects and 
client representatives who were not part of the case study but had collaborated with the same client representatives 
of the case study, on different projects. The purpose of these additional interviews was to gain a broader perspective 
on the client representatives’ preferences and working dynamics between architects and client representatives 
when HMD VR is used for design review purposes. The assumption was that insights from individuals who had 
collaborated with the same client representatives on different projects could offer valuable comparative insights 
into preferences for information and visualization medium as well as decision-making processes when VR is used 
for design review. 


These additional interviewees consisted of 6 exterior architects and 2 client representatives. The architects’ 
experience of using HMD VR were mostly limited to informative purposes (i.e., presenting the building design 
without incorporating any feedback) and using HMD VR for design review purposes only during single VR 
workshop sessions. These architects were asked about 1) their expectation before and after the single VR-workshop 
session and 2) what challenges they consider as necessary to address in order to increase the use of VR for design 
review purposes. The 2 client representatives had used HMD VR for both informative and design review purposes 
were also interviewed and were asked about 1) their expectation before and after the single VR-workshop session 
and 2) how use of HMD VR helped them assess design issues and provide feedback to architects during design 
review. 


2.4.2 Video recordings 


The case with its three workshops were video recorded, with a GoPro 360 camera for a total of 3h, with 45 minutes 
from each workshop being selected. The two stationary GoPro 360 cameras were placed in elevated positions to 
capture the participants’ collaboration, movement, and use of the different user-interfaces (e.g., multitouch-table, 
HMD VR) in the workshop room. The collected corpus of video data was transcribed for further analysis and later 
compared with the field notes and interview data. 


72 


The video data were analyzed by looking at the interactions between client representatives and the architects when 
resolving design issues as well as how both client representatives and architects each respectively interacted with 
the fully- (HMD VR) and non-immersive (1.e., projector screen and multitouch table) user-interfaces of ViCoDE. 
The verbal interaction between the participants was transcribed by one of the researchers. Segments of recording 
were selected based on when the greatest number of interactions took place between participants and the different 
user-interfaces of the VR-system in order to achieve sample representatives of captured data. 


2.4.3 Analysis of interaction pattern 


From the selected 45 min of video recordings of each workshop, 15 min were selected to analyze the interaction 
patterns between architects and client representatives and how both these type of stakeholders interacted with the 
fully- and non-immersive user-interfaces of ViCoDE. The selected time period for analysis of interaction patterns 
was based on 1) identifying parts of the selected video recordings where the architects and client representatives 
interacted the most with each other to resolve design issues, 2) interacted the most with the different user-interfaces, 
3) specific moments where key design decisions were made (e.g., reaching consensus on design issues based on 
the design review agenda) and 4) number of user-interfaces that were used to resolve specific design issues (e.g., 
revising different room layouts on the multitouch table and validating these via HMD VR and projector screen). 


These interactions are divided into three different groups: 


e Group: statements, callouts and interactions not directed to a specific individual, but more to the whole 
group. 

e Incoming: interactions directed to the person in question, such as a direct question and a request on the 
design 

e Initiated: interaction initiated by the person in question. This includes questions directed to other person 
initiated by this particular person. 


From the 15 min of each workshop that was video recorded, analysis was performed to count how many times the 
available user-interfaces — HMD, multitouch table, projector screen — were used by the different participants. These 
transcribed interactions were used to generate interaction graphs per workshop session with the different types of 
interaction documented in Microsoft Excel. The transcribed interactions were then imported to create a social 
network matrix using Gephi 0.10.1 to visualize these patterns emerging between users and the user-interface that 
they used (Bastian et al., 2009). The network comprises of nodes, representing participants in the different 
workshop as well as the available user-interface. The edges connecting the nodes are interactions between 
participants or participants and user-interfaces. Moreover, these edges are weighted by the amount of interaction 
occurring between the nodes, with distance and a bolder type of line indicating a stronger connection. Strong 
interconnection between participants/user-interfaces can be viewed in a cluster of nodes close to each other in the 
network as well. Lastly, the graphical layout algorithm selected for the social network is Fruchterman & Reingold 
layout algorithm (Fruchterman & Reingold, 1991), due to how it presents a good visualization of the interaction 
distribution. 


3. RESULT 


Based on the data analysis, six different visual interaction network graphs are presented, with two from each 
workshop showing the interaction between participants as well as the interaction participants had with the different 
user-interfaces of the VR-system. These graphs are presented together with the data captured in the semi-structured 
interviews as well as the video recordings. This was done to better understand how the use of the different user- 
interfaces of the ViCoDE system, used by the architects and client representatives, changed over the three VR 
workshop sessions. 
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3.1 Case study 
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Fig. 4: Architects were noticeably receiving questions about the design from client representatives. Conversely, 
client representatives coordinated decision-making among themselves and used the HMD and multitouch table. 


As illustrated by the edges of the nodes above the architects only used HMD VR once and instead used the non- 
immersive projector screen when viewing the virtual environment. Still, interesting to note is that when wearing 
HMD VR, design issues identified by client representatives were reviewed by architect 1 using the mark-up feature 
to assess the questions client 1 and 2 were asking. Rather than using HMD VR, both architects opted to view the 
projector screen, which displayed the perspective of HMD VR users, i.e., the virtual environment. Additionally, 
for a few minutes they used the multitouch table for discussing ideas and thoughts about the design. By seeing 
client representatives’ view from inside the virtual environment, architects directed the client representatives to 
different points in the building design. The architects’ non-immersive experience of the virtual environment via 
the projector screen was similarly observed in workshop 2 as well, with their fully-immersive (i.e., use of HMD 
VR) experience being limited to a few minutes during workshop 1. 


From the client representative perspective, the multitouch table was the main user-interface used during the first 
workshop. This was done to resolve design issues in spaces such as classrooms to identify design issues related to 
hidden sightlines and furnishment. Specifically, by using interactive features such as object manipulation on the 
multitouch table and multi-scale in the HMD, client representatives used these different user-interfaces of the 
ViCoDE system to implement a scenario-based approach during design review. For instance, when changing the 
furnishment layout and placement of walls and windows on the multitouch table, client 1 and 2 viewed in real- 
time via the HMDs that there were insufficient space which would prevent teachers from performing their daily 
work tasks (see fig 5). 


Fig. 5: Design issues identified and resolved in the first workshop via the multitouch table and HMD user- 
interface (left), which then were incorporated into the second workshop (right). 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


In the context of identifying design issues such as a lack of sufficient space, client representatives 1 and 2 who had 
experience interpretating 2D drawings, highlighted the difficulties with correctly assessing the design via 2D 
drawings. Explaining in the interview after the third workshop how “regardless of our experience, there is a 
tendency for us to miss design issues when using 2D drawings [Client representative 1]” but also “difficulties with 
understanding the volume of object, such as loose furnishment, the room space itself or window elevation [Client 
representative 2]”. This statement by the client representatives, however, contrasts those of the architects who 
shared in the interviews that they believed that “client representatives experienced with interpretation of 2D 
drawings have sufficient understanding for assessing the building design correctly [Architect 1]”. This belief 
among the architects could be explained by their previous experience of having worked with the same client 
representative when 2D drawings were used. 


From the architects’ perspective, the multitouch table was increasingly used in all three workshops. Even though 
HMD VR was not worn for more than a few minutes by either architect, with one of them explaining the reason 
being that “that we already know how to visualize the virtual environment in our heads by spatial reasoning, as 
we have been trained by practice... [architect 1]”, the multitouch table was the user-interface the architects 
interacted the most with. For example, whilst client representatives used all the different ViCoDE user-interfaces 
to identify and resolve design issues in the first workshop, the architects instead viewed the projector screen to 
discuss design issues with the client representatives. Then in the second workshop, both architects started to use 
the multitouch table more, as a result of helping client representatives who were unable to resolve design issues 
by themselves. Lastly, during the third workshop, they took the initiative to use the multitouch table and actively 
started to lead the discussions and in particular design issues related to building code requirements (see fig 6). 


Fig. 6: Workshop 1-3. Architects directing client representatives (left) and discussing ideas with them whilst 
sketching (center), to directly using and reviewing the design via the multitouch table (right). 


With none of the architects having used VR for design review purposes and only for informative purposes, the 
progressive interaction with the multitouch table indicated a certain acceptance among the architects. They used it 
for task-based scenarios, which could be shown and explored in VR (see fig 5 and 6). 
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Fig. 7: Client representative 1 and 2 who participated in all three workshops had frequent interaction with each 
other (left). Both architects also became more involved via their use of the multitouch table (right). 
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As shown by the node cluster in Fig. 7 (right), client representatives continued using the multitouch table in 
combination with the HMD VR to rapidly review design revisions via task-based scenarios. Moreover, as indicated 
by Fig. 7 (left), both architects started to interact more with the rest of the participants as well as starting to 
primarily use the multitouch table instead of observing the virtual environment via the projector screen. In this 
context, video recordings show how both architects first sketched ideas on how certain room design layouts for 
different classrooms could be arranged (see fig 5 and 6). Following this sketching procedure, the object 
manipulation feature of the multitouch table would be used to validate the feasibility of the design based on these 
sketches. Lastly, after having decided upon different layouts, the architects would view the projector screen in 
combination with client representatives using HMD VR to discuss thoughts and ideas about the design together 
with the client representatives. 


In workshop 2, the architects began using multitouch table and became more engaged in the design review process. 
Similarly to workshop 1, client representatives worked independently during design reviews, separate from the 
architects. In the context of resolved design issues, video recordings show how the first and second workshop 
focused on spatial zone relationships, hidden sightlines and furnishment of classrooms and different spaces. Once 
these design issues had been identified and resolved, 2D drawings were used alongside the multitouch table by 
both client representatives and architects. With the use of 2D drawings, the architects took a more active role 
during the design review and specifically the decision-making related to review of building code requirements. 
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Fig. 8: Contrasting the first workshop, architects initiated more interactions with client representatives (left) as 
well as interacted more with the multitouch table (right). Clients had similar trend in their interactions as before. 


As illustrated by Fig. 8, workshop 3 resulted in a shift in interactions from workshop 1. Specifically, during 
workshop 1, client representatives led the decision-making regarding the design and during this process used both 
the non-immersive (i.e., projector screen) and fully-immersive (i.e., HMD VR) user-interfaces of the ViCoDE 
system. This lead of the decision-making process by the client representatives differed from the 3 workshop. 
Instead of being on the receiving end of decision-making, architects now initiated interactions with the different 
client representatives. 


One explanation for the shift in decision-making could be based on the architects being unfamiliar with how to 
use VR-systems such as ViCoDE for design review, but slowly during three workshops grew more accustomed to 
the different user-interfaces. Another explanation is that after addressing design issues concerning spatial 
understanding of the building design, the architects took the lead in interactions and decision-making when 
reviewing the building code requirements, similar to a traditional design process involving 2D drawings. 
Connected to the last explanation, another possibility for the shift in decision-making could be that initially when 
client representatives were using the different user-interfaces of the VR-system, the architects were unsure on their 
“role” during design review with a VR-system such as ViICoDE. Specifically, whilst the video recordings show 
how the architects initiated interaction and discussions during the last workshop, the design review were mainly 
facilitated by the client representatives themselves. With client representatives doing design review mostly 
independent from the architect in the first two workshops, the role of the architects in these sessions was questioned, 
as evident by the incoming interactions shown in Fig. 4 but also as the video recordings show when the client 
representatives questioned design issues they previously were unable to identify during design review with 2D 
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drawings. Moreover, this challenge of the architects’ role in the design process was also pointed out in the interview 
with the interior architect involved in the project: 


“When using VR for design review purposes, I sense that end-users such as client representatives identify design 
issues before we (architects) do and following this, ask questions about the design that we usually are the first to 
ask whenever 2D drawings and 3D models are used. [interior architect] ” 


Whilst the architects’ lack of experience in using VR for design review purposes explains this statement, it could 
also explain client representatives’ receptiveness towards using the VR-system from the very first workshop. For 
example, whilst the architects were initially reluctant to interact with the multitouch table, client representatives 
actively and continuously used all user-interfaces of the VR-system and in particular, the different interactive 
features of both the multitouch and HMD VR. 


In general, results for all the three workshops showed client representatives and architects being receptive towards 
using many of the same user-interfaces of the VR-system. Whilst similarities in interaction towards using the 
multitouch table were apparent, equally apparent was the contrast in reluctance and receptiveness in using the 
HMD VR. From the architects perspective, they became more receptive towards use of interactive features such 
as object manipulation throughout the three workshop sessions. Specifically, by first sketching with pen and paper 
and then testing these different layouts on the screen showed how the architect initially started with directing client 
representatives via the projector screen to later in the second and third workshop actively being part of the 
discussions that took place. From the client representative perspective, receptiveness to both the non-immersive 
user-interfaces (i.e., multitouch table and projector screen) and the fully-immersive one (i.e., HMD VR) led to a 
decision-making process mostly independent of the architect. Moreover, this independence shifted in the second 
and in particular the third session when design issues related to building code requirements (e.g., daylight, distance 
between spatial zones) were reviewed. When reviewing these requirements, the architect was more receptive than 
the client representatives in using the multitouch table and with both type of participants showing reluctance 
towards using the HMD VR when reviewing building code requirements. 


4. DISCUSSION 
4.1 Participants’ interaction with user-interfaces of a VR-system 


Findings show that the different user-interfaces of the ViCoDE VR-system were used to varying degrees during 
the three workshops. On the one hand, client representatives from the first workshop all through the third session, 
interacted mainly with the HMD VR and multitouch table as well as the projector screen. On the other hand, the 
architects’ reluctance to use HMD VR for design review persisted throughout all three sessions. Yet, both client 
representatives and architects frequently used the multitouch table. 


From the client representatives’ perspective, different interpretations can be made as to why their interactions with 
the different user-interfaces were consistent. Firstly, with mainly HMD VR increasing their spatial understanding 
compared to 2D drawings (Chowdhury & Schnabel, 2020) client representatives were able from the first workshop 
to identify and resolve design issues that they previously unaware of. This unawareness of design issues, regardless 
of previous experience with interpretation of 2D drawings, could also be interpreted as the client representatives 
being more receptive towards using the different user-interfaces of the ViCoDE system and in particular HMD VR. 
This could be due to HMD being perceived as a more engaging information and visualization medium than 2D 
drawings (Johansson, 2016) with the ego-centric frame of reference client representatives had via HMD VR (Paes 
et al., 2023), helped them better assess different furnished layouts in workshop | and 2. Moreover, the design 
changes made in the 1* workshop with HMD VR and multitouch table and later incorporated in the 2"! workshop, 
can be interpreted as client representatives being provided with clearer visual cues (e.g., size of room, placement 
of windows) (Hermund et al., 2017) when perceiving volumetric qualities of the building design better in fully- 
immersed virtual environment (Chowdhury & Schnabel, 2020). Secondly, it could be argued that by identifying 
design issues such as hidden sightlines, furnishment and design of different spaces (i.e., workshop 1 and 2) via 
HMD VR, client representatives grew more receptive towards continued interaction with HMD VR but also the 
multitouch table. Thirdly, by adopting task-based scenarios during design review (i.e., workshop 1 and 2), it can 
be argued that collaborative practices such as Co-design further helped facilitate their interaction with HMD VR 
and the ViCoDE system at large (Roupé et al., 2020). Also, as observed during workshop 1 and 2, client 
representatives made design changes such as furnishment and revision to layout of different spaces independently 
from the architects. This independence is further acknowledged by the project’s interior architect when she 
explains how clients tend to identify and ask questions about the design in HMD VR before the architects do. This 


77 


suggests that Co-design is more likely to emerge when VR-systems with support for both HMD VR and object 
manipulation are used during design review, as client representatives become part of the design team (Caixeta et 
al., 2019). 


From the architects’ perspective, it is interesting to notice, due to how VR being described primarily as a 
presentation tool used by architects (Achten, 2021; Scheer, 2014), that interaction with different user-interfaces in 
the different workshops, was mostly limited to the non-immersive experience, offered by the projector screen and 
the multitouch table. By viewing client representatives’ HMD VR view of the virtual environment and using the 
projector screen to direct them in the building design, the architects’ experience of the virtual environment was 
limited to a non-immersive one. In this context, when using sketching with pen and paper prior to testing the idea 
on the multitouch table, we saw that the architects are more receptive towards interacting with user-interfaces they 
perceive as familiar with their own traditional design tools (e.g., multitouch touch table with top-view similar to 
2D drawings) (Scheer, 2014), rather than user-interfaces they do not have experience in using for design review 
purposes (i.e., HMD VR). This familiarity could be argued to be based on the top view perspective they are used 
to work with (e.g., 2D drawings) but also for how the use of interactive features such as object manipulation 
allowed them to seamlessly test and validate design proposals when swapping between sketching and multitouch 
table. This idea of familiarity could also explain why architect 1 in the first workshop, on their own accord, used 
the mark-up tool to better understand the client representatives’ question on the design. To this point, it can be 
interesting for future studies to investigate whether architects design review in multi-user HMD VR, together with 
client representatives, would affect their resistance towards engagement with HMD VR. For instance, what 
interactive features would be needed in HMD VR to result in a shift for architects viewing HMD VR primarily as 
a presentation tool (Scheer, 2014), to one of their primarily chosen mediums used for design review? 


4.2 Communication between participants when using VR-systems 


Results from the three workshops suggest that VR-systems with support for multiple user-interfaces and available 
interactive features enable client representatives to have a Co-design approach to design review. With the architects 
being questioned on their decisions made earlier with 2D drawings (workshop 1 and 2), we could see that in 
conjunction with client representatives doing design review independently, that their role in the design review 
context came into question. With the third workshop instead consisting of architects initiating interactions with the 
client representatives as well as leading the design review when reviewing compliance with building code 
requirements via 2D drawings, the questioning of architects’ hierarchical position is supported (Cruickshank et al., 
2013; Scheer, 2014). Also, with the explanation provided by the interior architect on how design issues and 
questions on the design are now instead initiated by client representatives rather than the architect, it can be argued 
that the role of architects during design review with VR, needs to be further explored to address their resistance 
toward engaging with HMD VR. 


Consequently, providing client representatives with the conditions to express their needs about the design (e.g., 
use of interactive features to enable design review via task-based scenarios) results in increased sense of ownership 
of decision-making. Nevertheless, with architects experiencing a loss of decision-making power (i.e., workshop 1 
and 2) (Maftei et al., 2018), we saw that VR used for design review purposes would be useful in collaborative 
practices such as Co-design. In this context of Co-design, it is noteworthy how the architects in all three workshops 
started developing solutions to different design issues firstly via non-immersive design tools, i.e., sketching by 
hand, and then shifted over to the non-immersive user-interfaces of ViCoDE, i.e., multitouch table and projector 
screen. Whilst this hybrid design environment with both non-immersive and immersive user-interfaces can help 
involved participants to immediately validate their design decisions (Okeil, 2010), it could be argued that the 
design creation/feedback/revision cycle between architects and client representatives would be more efficient if 
alternation between non-immersive (i.e., projector screen and multitouch table) and fully-immersive (i.e., HMD 
VR) environments was not required. With architects developing and revising their design ideas in a fully- 
immersive environment, they would also likely face a loss of predictability in their working methods (Cruickshank 
et al., 2013) and decision-making on the design (Scheer, 2014). As such, the “loss” of predictability in working 
methods and decision-making power could make architects in the fully-immersive virtual environment susceptible 
to being wrong about certain design ideas, i.e., how the needs and wants of the building occupants are considered 
in the design. 


Therefore, by immediately facing the consequences of their design ideas via use of task-based scenarios (Nikolić 
& Whyte, 2021; Roupé et al., 2020), it is more likely that the typical hierarchical position of the architect (Scheer, 
2014) is questioned, due to client representatives identifying and resolving design issues independently from the 
architect, as evident by 1‘ and 2" workshop. Although the setting of the workshops involved participants using 
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both a non- and fully-immersive approach to design review, it could be argued that the questioning of the architects 
decisions of the design would be further reinforced when the design review with involved participants is mainly 
done in a fully-immersive environment. As a result, it is important to explore the conditions architects need, to 
design review in fully-immersive VR. For instance, how would availability of object manipulation in HMD VR 
influence collaboration between architects and client representatives? Would collaborative practices such as Co- 
design more likely be facilitated with everyone doing design review in the virtual environment? Connected to the 
last point and as evident by the video recordings from workshop 1 and 2, the VR-system tested in this study, i.e., 
ViCoDE, meant that 2 out of 3 design review sessions were observed to be an isolating experience for the architects. 
This isolating experience meant that the architect’s typical role of a facilitator (Scheer, 2014), were not as apparent 
as in design processes involving traditional design medias (e.g., 2D drawings and 3D models). It would be of 
interest to further study what type of role, whether similar or a new one, architects need to take on when doing 
design review with fully-immersive VR (Maftei et al., 2018). Thus, these are just some of the questions worth 
exploring in future studies, to better understand the resistance to engagement with VR-systems. 
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COLLABORATIVE SITE LAYOUT PLANNING USING MULTI-TOUCH 
TABLE AND IMMERSIVE VR 
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ABSTRACT: Building Information Modeling (BIM) is changing the way architects and engineers produce and 
deliver design results, and object-oriented 3D models are now starting to replace traditional 2D drawings during 
the construction phase. This allows for a number of applications to increase efficiency, such as quantity take-off, 
cost-estimation, and planning, but it also supports better communication and increased understanding at the 
construction site by means of detailed 3D models together with various visualization techniques. However, even 
in projects with a fully BIM-based design, there is one remaining part that is still done primarily using 2D drawings 
and sketches — the construction site layout plan. In addition to not take advantage of the benefits offered by 3D, it 
also makes it difficult to integrate site layout planning within the openBIM ecosystem. In this paper we present the 
design and evaluation of a user-friendly, IFC-compatible software system that supports collaborative, multi-user 
creation of construction site layout plans using both multi-touch table and immersive VR. By allowing temporary 
structures, machines, and other components to be easily added and updated it is possible to continuously produce 
and communicate 3D site layout plans that are aligned with the schedule and supports integration with other BIM- 
tools. 


KEYWORDS: BIM, VIRTUAL REALITY, VR, OPENBIM, IFC. 


1. INTRODUCTION AND BACKGROUND 


With increased focus on digitalization and efficiency within the Architecture, Engineering, and Construction 
(AEC) industries, detailed Building Information Models (BIM) from the design are now often available for use by 
the contractor. This can facilitate the tendering process and make cost estimation and planning more efficient, but 
above all it supports enhanced communication and understanding during the production phase in the form of 
detailed 3D models with corresponding metadata. Furthermore, already today there are examples of so-called 
“drawingless” projects such as R6forsbron, Slussen, and Celsius, which clearly shows that the industry is moving 
more and more towards a situation where traditional 2D drawings are given less space (Cousins, 2017; Johansson 
& Roupé, 2019). In fact, in Scandinavia, Total BIM has emerged as a concept where the BIM is the legally binding 
construction document and no traditional 2D drawings are delivered to the construction site (Disney et al 2022). 
However, there is still one document that the contractors themselves have to create and keep up to date — the 
construction site layout plan. 


Currently, the site layout plan is often drawn up in 2D by default — often using Bluebeam — and although the work 
differs between projects, there are several recurring problems connected to it (Andersson et al., 2019). Gros (2019) 
investigated the work with site layout plans at one of Scandinavia's largest contractor and found that: 


Even if all of the design is done using BIM, the site plan is still usually in 2D 
Typically just one person working with the site plan 

The site plan is rarely updated and often differs from reality 

The work with the site plan is often linked to lack of time and stress 

Often poor communication and respect for the site plan (difficult to interpret plans) 


However, there are also several good examples in practice which have shown the possibilities of working with site 
layout plans in 3D, often created and maintained in SketchUp (Jongeling, 2013). 3D offers many benefits regarding 
elevations and general workplace organization in the vertical dimension, at the same time as it is easier 
communicate and present ideas around it. Still, this approach typically requires a modeling expert responsible for 
updating the plan, and in the end these plans tend to be exported as static 2D images instead of being integrated 
with other BIM datasets (Gros, 2019). 


Going beyond site layout planning in real-world projects, much research has focused on turning site layout 
planning into an optimization problem that can be automated, which — in many ways — is similar to using 
probabilistic and generative methods for automated creation of production plans and schedules (Taghaddos et al., 
2021; Abune'Meh et al., 2016; Kumar and Cheng, 2015; Isaac and Shimanovich, 2021; Fischer et al., 2018). At 
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the same time, there is also research that emphasizes the benefits of collaboration, teambuilding, and commitment, 
and instead advocate more focus on user-friendly software tools and various visualization techniques, for instance 
Virtual Reality (VR), to support the collaborative planning work (Tallgren et al., 2021, Tallgren et al., 2020). VR, 
in particular, can clarify aspects of the design that are difficult to comprehend from traditional 2D documents, and 
can better resemble real work environments — features that are useful when evaluating planning scenarios and 
reviewing constructability (Zaker and Coloma, 2018; Wolfartsberger, 2019). Given these properties, it is therefore 
logical that the use of VR has been tested also for site layout planning (Xu et al., 2020; Muhammad et al., 2019). 
In this context, and when compared to traditional 2D methods, VR has been shown to make the plan more effective 
to comprehend and to enhance the ability to detect clashes (Muhammad et al., 2019). Nevertheless, certain aspects 
of the layout planning are still considered to be more efficient in 2D, which tells us that instead of trying to choose 
between either one of these interfaces it would perhaps make more sense to try and combine them, which has been 
a successful approach for both urban planning and collaborative healthcare design (Faliu et al., 2019; Roupé et al., 
2020). In this paper we take inspiration from these ideas and present the design and evaluation of a multi-user, 
multimodal system for collaborative creation of site layout plans. The system combines multi-touch table and 
immersive VR, but contrary to similar approaches within urban planning and healthcare design much more focus 
has been put on integration within the openBIM ecosystem. 


2. THE COLLABORATIVE SITE LAYOUT PLANNING ENVIRONMENT 


To support a multi-user, multimodal planning environment we have used BIMXplorer and further customized it. 
BIMXplorer is a real-time desktop- and VR-viewer that directly supports the IFC file format and creation of 
federated building models (Johansson, 2016; BIMXplorer, 2023). IFC import is implemented using the xBIM 
framework (Lockley et al., 2017), and by taking advantage of efficient occlusion culling, BIMXplorer allows large 
and complex BIMs to be visualized in immersive VR without the need to simplify or decimate the input dataset. 
The VR user interface — explained in detail in (Johansson and Roupé, 2022) — consists of a tools palette with 
support for sectioning, measurement, filtering, markups, BCFs, and multi-user sessions (Fig 1). In the following 
subsections we further describe the multi-touch as well as VR interface that were developed to support user- 
friendly, collaborative creation of site layout plans. Fig 2 presents an overview of the new system using a sample 
configuration with both co-located and remote clients. 


Fig. 1: Examples of different tools and models in BIMXplorer. 
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Fig. 2: System overview using a sample configuration. 
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2.1 Prefab database creation 


Site layout planning in 3D mainly consists of placing 3D-objects that represents temporary structures, machines, 
and temporary placements for materials as instances in a 3D-environment. We refer to these objects as prefabs, 
and the motivation around the creation and organization steps was to allow easy creation from already present 3D- 
models or BIMs. We therefore implemented support for also importing .skp- and .fbx-files using their respective 
APIs, and then simply implemented a tool in BIMXplorer to select and save a single or multiple objects as a prefab, 
with the current view as the preview images (Fig 3, middle). Depending on the type of component it also makes 
sense to be able to select the center of rotation (i.e. during planning), which is why we optionally support placement 
of a pivot point using a standard translation gizmo. Finally, with a number of prefabs created, a user can then 
organize and group the prefabs in different folders and subfolder (Fig 3, right) before selecting a root-folder, which 
will then import and create a prefab database file (.pfd-file), which will have the same organizing structure. 
Assuming a very large number of prefabs are originally created, this makes it possible to select a subset for a 
certain planning workshop, like a template. The whole procedure is illustrated in Fig 3, where a SketchUp scene 
is first imported (left), followed by selection and isolation of a skip container that is save as a prefab (middle), and 
finally the folder structure where it is saved (right). Note however that this only has to be done once, to create a 
pfd-file, which can then be re-used as a template in several planning workshops. 
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Fig. 3: The prefab creation process in BIMXplorer and file structure. 


2.2 Desktop UI and touch interface 


The desktop, multi-touch interface is inspired by our previous work for healthcare environment design (Roupé et 
al., 2020), but implemented on top of BIMXplorer and much more adopted for use in a openBIM ecosystem. The 
touch interaction is a custom implementation using “raw” touch events in Windows, i.e. listening to WM_TOUCH 
events. The actual interface follows that of StreamBIM, with two-finger pan-and-zoom, one-finger for look-around 
in 3D as well as scrolling in menus, and one-finger tap for selection and button pressing. No inertia is used. The 
UI is implemented with Dear IMGUI and has a collapsible toolbar with functionality for adding objects, sectioning, 
visibility and filtering, settings, and file I/O. As seen in Fig 4, sectioning is done by selecting a level/floor from 
IFC-data and can then be adjusted up or down. With BIMXplorer already using Dear IMGUI for the tools palette 
in VR, it was possible to directly re-use certain Ul-element, such as for filtering and sub-model visibility. In this 
context, the filtering capabilities are particularly interesting as it allows for controlling visibility and colors of 
objects based on their properties. This makes it possible to filter out certain scenarios if the data is available in the 
IFC-file(s), such as subcontractor or scheduling information. For instance, if scheduling information is present, 
it’s possible to filter out only those objects that will be constructed at a certain point in time, making site layout 
and logistics planning adhere to the real construction schedule. 


Fig. 4: The multi-touch table interface, including sectioning-by-floor, and filtering in the top-down view. 


All the imported prefabs (i.e. the prefab database) are accessible in a folder structure and are added using drag- 


83 


CONVR 2023. PROCEEDINGS OF THE 23° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


and-drop as illustrated in Fig 5. Selecting an object by tapping brings up the context menu making it possible to 
hide or delete the object. A selected object — or multiple selected objects — can directly be moved horizontally by 
dragging or rotated using the “gizmo”. By toggling one of the context menu buttons, vertical movement is activated 
instead. 


Fig. 5: Adding prefabs using drag-and-drop (left), and context menu and rotation gizmo (right) 


2.3 VR interface 


A similar interface for adding objects is implemented in VR as well by dragging and dropping prefabs from the 
tools palette, as seen in Fig 6, left. In fact, as this is done using Dear IMGUI the actual code is almost identical, 
which is one of the main benefits of using the same UI toolkit for both 2D desktop and immersive VR. Moving, 
re-placing, and rotating objects is also similar, but with gizmos more adapted for use in a pure 3D environment (as 
opposed to a 2D desktop interface). 


Fig. 6: Adding prefabs using drag-and-drop in VR (left), view of VR avatar in other clients (right) 


2.4 Multi-user and collaborative planning 


The multi-user functionality already present in BIMXplorer was extended to also support collaborative planning 
and adding, translation, and removal of instanced prefabs. The implementation is based on the Photon Realtime 
SDK and uses no other server infrastructure. All clients load the same model (.bmx-file) and prefab database (.pfd), 
and then call “JoinOrCreateRoom” (Photon API) with a previously agreed upon meeting ID. The first client that 
calls this function performs the actual creation of the Photon “room”, and all other clients will then join it. Every 
modification to the shared environment, such as adding or translating objects, creating 3D-markups, or 
hiding/showing objects is transferred to all clients with the use of Photon events. These events use the 
“SendReliable” and “Cached Event” functionality in Photon to make sure that even if a client is connecting much 
later than the other, that client will still receive all the modification events that have already happened when joining. 
Position and orientation of all the clients (i.e. the avatars), on the other hand, is using “SendUnreliable’’ because 
it is regularly updated anyway. However, in either case, no 3D-data is ever sent over the network, just IDs and 
transformation matrices. The only exception is 3D markups which are represented as a polyline with 3D 
coordinates. Still, all clients must be able to uniquely identify objects and prefab instances even if created locally 
on a single client. The solution was to simply let each client generate and assign a GUID when adding or creating 
a new object (using CoCreateGuid). 


2.5 IFC export and openBIM 


As previously stated, one of the main challenges when considering site layout planning in a modern BIM context 
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is the need to integrate and align different data sources, from design as well as from production. In essence, this 
means that we can no longer only produce images and 2D-data, but instead also needs to provide 3D-data. As the 
solution to consume BIM-data is through the IFC file format it thus makes sense to also use that for producing 
data. Fortunately, the xBIM framework that is used in BIMXplorer to import IFC-files also has functionality to 
create IFC-files. With the underlying geometry representation in BIMXplorer being indexed triangular meshes we 
have chosen to use IFC4 which has support for “/fcTriangulatedFaceSet”. The possible options exposed for IFC- 
export are everything, selected, or visible, which means that it is also possible to only export a subset of the planned 
components as an IFC-file. This makes it possible to separate the exported IFC-files both temporally and spatially. 
Furthermore, by using the BCF functionality it is possible to transfer additional information, either back to the 
design organization or as viewpoints or “points-of-interest” for on-site mobile communication platforms, such as 
Dalux or StreamBIM. Example of both IFC- and BCF-export is seen in Fig 7, where the sample layout shown in 
Fig 5 is exported as an IFC4 file and imported into Solibri together with two BCF viewpoints. In Fig 8 the 
openBIM-supported model- and dataflow is illustrated. 
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Fig. 8: Schematic illustration of the openBIM-supported 3D model- and dataflow. 
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3. EVALUATION 


The developed system has been evaluated during a workshop session with representatives from the construction 
industry. The primary focus during the workshop was around safety and specifically to see if immersive VR could 
provide benefits in detecting hazardous situations compared to only using traditional 2D drawings. As part of that 
investigation the site layout planning functionality was tested and evaluated with respect to placing guardrails and 
temporary covers. The complete setup during the workshop can be seen in Fig 9. A single, large touch screen was 
used together with two VR headsets, one of them also connected to a projector. Seven (7) participants, both from 
design and production, took part in the exercise which lasted around three hours. The test case was the 6th floor 
of the Kineum project, a 27 story tall building recently constructed in Gothenburg, Sweden (Fig 10). This project 
was chosen due to the sheer size and complexity, but also because design documents included both BIMs and 
traditional 2D drawings. In the first part of the exercise the participants where ask to identify areas that can be 
hazardous during construction using only 2D drawings. In the second part the same was done, but this time in VR 
using multi-user. In the third part, safety equipment, such as guardrails and covers, was placed and updated using 
the site layout planning tools. However, note that the possibility of adding and moving objects in the immersive 
VR interface was not implemented at the time of the workshop. Still, all modifications done on the multi-touch 
table (add, move, delete, etc.) was updated in the VR interface. As an additional and final step, the evaluation was 
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completed with a post-workshop interoperability test. 


Fig. 10: The Kinum project; BIM, structural-only BIM, construction, completed building (left to right). 


4. RESULTS AND DISCUSSION 
4.1 2D vs. VR and multi-touch table for safety review 


Except for the outer perimeter, there were mainly four areas that required safety precautions; three large openings 
and the area around the elevator shafts. From the 2D exercise only two participants managed to identify all of them. 
Among the other participants there were various differences, but they all missed the triangular shaped opening, 
which is actually quite difficult to spot in 2D due to its somewhat uncommon shape. However, when moving on 
to the second part in the exercise, using VR, all areas requiring safety precautions were easily identified by all 
participants. On the one hand, this could be seen as an unfair comparison, considering that there were elements of 
collaboration with the multi-user setup. On the other hand, this was the first time using VR for four of the 
participants which introduce an extra layer of mental workload before navigation and interaction with the different 
tools are fully understood. Regardless, it was clear from the participants’ response and comments that VR provided 
a much more immersive and true-to-scale experience that allowed all of them to easily detect all floor openings, 
including the one that most missed in the first part of the exercise. Still, already by inspecting the model top-down 
view at the touch table, openings were much easier to detect than in the 2D drawing alternative due to the 3D 
perspective and shading. However, VR was considered particularly valuable in complex situations that are difficult 
to assess through traditional methods. Furthermore, with the general understanding that every action in 
construction carries some level of risk to workers' safety, all participants acknowledge the unique properties of VR 
to understand and comprehend complex design choices from a safety and constructability perspective, thereby 
improving safety planning and design. Finally, in addition to simple identify the hazardous areas, the participants 
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were also asked to use the markup tool to illustrate suitable safety measures, as seen in Fig 11. 
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Fig. 11: Participants used the markup tool in VR to illustrate where safety measures were needed. 


4.2 Planning and collaboration 


In the third part, the top-down interface on the multi-touch table was used to place safety measures around the 
identified hazards, the results of which can be seen in Fig 12. One participant had experience from similar planning 
settings and specifically around the difficulties in getting people to interact with BIMs and 3D models on touch- 
and smartboards using desktop BIM-viewer applications and was surprised to see how easy and intuitive it was to 
add new components to a highly complex BIM using the multi-touch interface: 


“We have always had problems in the past of getting non-experts and 'normal' people to be able to interact with 
these large and complex BIMs” 


The main activity was around the touch-table and in that respect some participants indicated that they felt a bit 
isolated when immersed in VR, even if this was a multi-user session. Similar observations have been noted in 
previous research as well (Roupé et al., 2020; Truong et al., 2021). However, this might have been different if the 
functionality of adding and moving objects also in VR had been available during the workshop. Other than making 
markups, the VR users could only tell the other participants what to do and therefore became more of observers 
and reviewers, than that of creators. Still, participants around the touch-table liked that they always could see 
where VR viewers were. On the other hand, it was clear that immersive VR and the 1:1 scale was superior in order 
to understand narrow or wide space and to identify safety and constructability issues. For instance, the initially 
placed safety precautions for the elevator shaft section were later identified as too "light", which was not as easy 
to spot in the top-down desktop interface, but very obvious when seen in a first person, true-to-scale perspective. 
This actual combination and collaboration using multiple interaction and visualization interfaces was also 
identified as an efficient setting in order to increase understanding and share and exchange knowledge across 
professional disciplines. In particular, it was stated that the collaborative walkthrough between design and 
construction safety team can increase designers’ awareness and foster designer contribution for safety planning. 


In addition to the request for also adding and moving components in the VR interface, there were several 
suggestions for improvements and also some identified issues. One suggestion that came very early during the 
workshop was to implement more of a polyline-drawing-tool for the guardrails, as it was found a bit inefficient 
and time-consuming to drag and drop all the individual sections and then place and rotate them correctly around 
the openings. To some degree this was also made extra cumbersome because the ray-intersection routine (i.e. hit- 
testing) does not use a dedicated collision shape but instead uses the actual geometry, which in the case of the 
guardrails consists of thin bars that are difficult to hit. The concept of a polyline drawingmode was also suggested 
for an area-drawing tool (i.e. surfaces). Even if 3D components are preferred as representations several participants 
highlighted the need to be able to also illustrate areas. Further suggestions included the possibility to group objects 
together to form composites and to be able to copy-paste objects (both individual and composite groups). 


Fig. 12: Guardrails, temporary covers, and fences placed using the top-down interface on the multi-touch table. 
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4.3 Interoperability 


All the safety equipment that was added to the project during the workshop was exported as a single IFC-file and 
then imported into both Solibri and StreamBIM together with other disciplines in the project (e.g. architectural, 
structural, MEP, etc.), which can be seen in Fig 13. In effect, this introduces a new discipline to the federated BIM 
— Temporary Safety Components. Although the IFC-export wasn’t actually used during the workshop (i.e. it was 
imported into StreamBIM and Solibri at a later time), the functionality was discussed and the prospect of having 
an IFC-file with temporary structures and site objects as output sparked many ideas from the participants. For 
instance, it meant that all other tools that are part of the BIM palette, such as automatic quantity takeoffs and clash 
detection, could now also be used for temporary structures and components. However, regardless of future 
applications, the interoperability test successfully completed our initial evaluation which shows that, not only can 
non-designers collaboratively create construction site layout plans in 3D, but also directly integrate and use this 
3D data together with all the other BIM sources received from the design organization. 
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Fig. 13: The “temporary safety components” IFC-file imported into Solibri and StreamBIM. 


5. CONCLUSIONS 


In this paper we have present the design and technical details of a user-friendly, IFC-compatible software system 
that supports collaborative, multi-user creation of construction site layout plans using both multi-touch table and 
immersive VR. In addition we have presented an initial evaluation of this system with respect to safety review and 
planning and layout of temporary safety components. For the specific task of identifying hazardous situations, the 
presented system was found to be more efficient compared to only using traditional 2D drawings. The multiple 
interaction and visualization interfaces were found to complement each other and to provide an efficient 
environment for collaboration and knowledge sharing across different professional disciplines. In this context, the 
immersive VR interface was found to be superior in order for users to understand space, dimensions, and complex 
designs, whereas the multi-touch interface was considered very intuitive and easy to use with suitable tools for 
adding and modifying 3D components. With the ability to also export the planned environment as an IFC-file the 
system has been shown to support creation and continuous update of 3D site layout plans that can be fully 
integrated with a projects other BIM sources and tools. 


For future work it would be interesting to implement some of the request and suggestions that were proposed 
during the evaluation, such as a polyline drawing tool for guardrails and grouping of components. Furthermore, it 
would be interesting to explore and evaluate the system with a more dedicated focus on all aspects of construction 
site layout planning, not only safety. 
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ABSTRACT: As never before, during the COVID-19 pandemic, the effectiveness of the digital design strategies 
on the user s well-being has been questioned. However, a research branch astride digital design and neuroscience 
able to overcome net discipline borders to analyse users’ well-being seems to be lacking. Today mainly qualitative 
data are used in the design field for the investigation of users’ quality experience. Although fundamental, they also 
have great disadvantages such as unanswered questions, unconscientious responses, and respondents’ biases. As 
such, a systematic state of art review is presented to find methodologies and tools currently used in medicine to 
identify the impact of digital design strategies (XR) on users’ well-being through quantitative and objective data. 
The main technologies used for this purpose have been synthesized in a schematic chart by reporting the principal 
related biometric data (skin conductivity, heart rate metrics and breathing rates), as well as other technologies 
such as video/images/audio analysis based on sensors and machine learning to reach out mass numbers. In 
conclusion, gaps and future applications of this innovative approach within the virtual environment have been 
identified by the authors. 


KEYWORDS: extended reality, virtual reality, neuro-design, digital design, immersive experience, user 
experience, well-being assessment 


1. INTRODUCTION 

In these current times and especially within the COVID-19 frame, we are witnessing design calling itself under 
question by focusing on the redefinition of the relationship between users and spaces. Although the importance of 
quality space has been considered a central point in design planning, the current pandemic is indeed pointing out 
a global dissatisfaction in this regard (Melone & Borgo, 2020) (Amerio et al., 2020) (Alraouf, 2021). Concurrently, 
in this perspective, the advanced technologies - such as eXtended Reality (XR)- are increasingly used to early 
acknowledge people responses in terms of spaces’ satisfaction. However, in this regard, is it possible to take a step 
back and understand the impact that such immersive technologies have on users' conscious and unconscious 
responses? We’re witnessing an increasing digitization both in our daily lives and in the working environment to 
such an extent that the virtualisation of the spaces -for knowing the customer satisfaction in advance as well as for 
creating an alternative world- is becoming an increasingly debated issue. For this reason, it becomes critical and 
extremely relevant to understand what impacts immersive spaces have on people's well-being. Although the 
importance of user experience in terms of well-being has been traditionally recognised, few studies have been 
conducted in the evaluation through scientific wellbeing detection in virtual spaces. Furthermore, despite the 
medical concurrent and ongoing findings for stress investigation, the implications of these results appear to 
struggle to be applied in digital design and in design in general. In this sense, medicine is today perfecting an 
ongoing and relevant theme of stress detection analysis since increasingly stress is becoming a serious problem 
for users’ productivity and efficiency in modern society (Feng et al., 2021) (Attallah, 2020). Although stress is one 
of the major contemporary problems, it is difficult for people to perceive even if they are subject to high stress 
levels or not (Sagbas et al., 2020) and for this reason the research field is working on a method that is able to return 
real-time stress detection. The role of neurodesign should be able to spotlight and investigate not only the effects 
of environmental factors on people’s behaviour but also study user’s biometric parameters, to inform contemporary 
digital design. However, even if today there is an increasing and updated interest on the influence of the digital 
design on public health, there is at the same time a significant lack of research (Burton et al., 2011). The research 
methodology has been conducted through a systematic state of the art tools for well-being and stress detection in 
order to define the main technologies to use for future applications within the immersive digital design. 

The methodological review begins with an extensive literature search to attain the desired research items, namely 
the well-being detection techniques and tools and it ends, through its synthesis, with the interpretation of the most 
relevant articles. To obtain pertinent articles on the topic, “Scopus”, “Web of Science” and “Pubmed” were used 
as the primary scientific search engines. In order to collect the most used techniques for people well-being by 
overcrossing net disciplines boundaries through a interdisciplinary approach, the choice fell to these three 
databases due to their huge coverage of peer-reviewed journals and conference proceedings in environmental 
psychology, medicine, construction, architecture and design. The time span for publication was not restricted to 
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recent times due to the willingness to maximise the inclusiveness of the searched items and obtain a wider possible 
framework. The first step of the literature research has been identified with the definition of the research scope and 
identification of keywords. 

After that, a short listing of the most relevant articles on the theme have been pursued by accurately analysing 
them for the research purposes, in order to identify the most significative and representative studies. To provide a 
comprehensive review of the existing literature, this research led to an in-depth study and qualitative analysis of 
the contents, to present the main analysis carried out on the subject theme. As such, after a first reading of the 
articles, a deepening phase with the gathering of the main used techniques for well-being detection for evaluating 
immersive experience was collected. Thus, stress detection tools currently used in medicine have been investigated 
to understand if could be possible to apply these for well-being evaluation in the frame of immersive digital 
environment. 


2. LITERATURE REVIEW FINDINGS 

This section aims at providing an in-depth discussion of the important findings of the reviewed literature with the 
aim to underline the well-being measurement techniques to be exploited in construction. 

The organisation of this section is twofold: first, it presents a collection of stress detection parameters; then, it 
provides insights regarding the tools to detect stress. Despite the extensive acknowledgment of the impact of the 
built environment on users’ well-being and the recent advancement of smart technologies, just a limited attention 
has been dedicated to digital design studies of the consequences of immersive environments on the health of 
occupants. Furthermore, to validate virtual spaces, even a more restricted consideration has been dedicated to the 
application of smart technologies derived from different research areas. Thus, a focus on the implementation of 
operating procedures for assessing health and well-being were analysed and reported in the present paragraph by 
especially referring to psychological and medical research sphere that could be employed in construction and 
occupancy evaluation. However, not all the initially founded methods have been reported due to the inconsistency 
between the encountered medical techniques and the application in the digital immersive design field. The below- 
listed stress detection parameters and tools have been investigated due to their possible use for design research 
purposes. 


2.1 Identification of the main stress detection parameters for stress detection to be 
implemented for evaluating immersive experience 

The following sub-paragraph outlines the output of the research investigation through a brief synthesis of the main 

techniques adopted for stress detection. As follows, 10 kinds of stress detection adopted data have been identified 

as well as 6 main techniques to collect them. The following list represents the techniques most used in medicine 

and psychology for which exist technology and techniques that could be used also in immersive environments for 

planning process, management and control of virtual reality setting, as well as the detection of quality experience 


(Fig.1) 


(1) Electrodermal activity (EDA) 

Electrodermal activity (EDA) also known as Galvanic Skin Response (GSR) or Skin Conductance (SC) is an 
objective tool of stress indication. EDA measures the changes in conductance at the skin surface due to sweat 
production that is representative of the intensity of our emotional arousal. It could be considered as a non- 
intrusive control tool, and for this reason it has been used in many studies thanks to the use of wearable devices 
(Acerbi et al., 2017; Anusha et al., 2020; Debard et al., 2020; Delmastro et al., 2020; Kalimeri & Saitis, 2016; 
Minguillon et al., 2018; Mozos et al., 2017) or embedded sensors (A ffanni et al., 2018; Sriramprakash et al., 2017a; 
Zalabarria et al., 2017) able to detect it. 


(2) Heart Rate Variability (HRV) 

While Heart Rate is the average number of beats in a minute, Heart Rate Variability (HRV) is defined as the 
standard variation of inter-beat intervals (Elzeiny & Qaraqe, 2018). HRV could be considered a biometric 
parameter upon which to determine people’s stress conditions. Thanks to the use of a tool able to detect heart 
signals, HRV could be easily collected through wearable devices (Acerbi et al., 2017; Debard et al., 2020; Rani et 
al., 2002) or other monitoring tools with specific sensors (Mozos et al., 2017; Reanaree et al., 2016; Sriramprakash 
et al., 2017b; Zalabarria et al., 2017)(Mozos et al., 2017) or a traditional ECG. HRV could be also combined with 
social media microblogs (Feng et al., 2021). 


(3) Electroencephalogram (EEG) 


Electroencephalogram (EEG) is a tool to detect real-time stress in daily life by means of the use of specific headsets 
and its signals. Many studies analyse stress levels thanks to the use of brain electrodes in this technique (Attallah, 
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2020; Elzeiny & Qaraqe, 2018; Kalas & Momin, 2016; Reanaree et al., 2016) and helmet(Kalimeri & Saitis, 2016). 
Although from a medical point of view it could be a non-invasive method thanks to the use of scalp surface, from 
a perspective of stress monitoring, on the contrary, it is quite intrusive since it requires the use of electrodes. 


(4) Electromyogram (EMG) 

Electromyography (EMG) could be considered as another stress alarm system. EMG measures muscle response 
for evaluating electrical activity. It is not easy to apply this method for stress detection in a built environment. 
Several studies use this method for real-time detection of stress levels (Elzeiny & Qaraqe, 2018; Ghaderi et al., 
2015; Minguillon et al., 2018). 


(5) Cortisol 

Cortisol is a hormone made by the adrenal glands that control mood and fear, and it is one of the most used 
biomarkers for stress levels. Salivary cortisol increases the brain’s use of glucose and is one of the most indicative 
factors which can analyse stress level. It could be measured by means of pipettors although it is not the most 
suitable for the built environment even if it is one of the most used in medicine as demonstrated through the huge 
usage (Pascoe et al., 2017; Qiao et al., 2017; Wells et al., 2014). 
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Fig. 1: Non-intrusive and intrusive parameters for evaluating immersive experience 


(6) Human Body Temperature 

The temperature of the human body could be one of the main data factors upon which it is possible to study stress. 
According to Rachakonda et al. (2019), in fact variation in body temperature is indicative of the physical and 
mental condition of people within a specific value range to identify high, medium and low stress. Many studies 
detect stress through this method by means of contact sensors ((Bin et al., 2015; Rachakonda et al., 2019) or non- 
contact sensors ((Elzeiny & Qaraqe, 2018). 


(7) Pupil Diameter 
Pupillometry is a primary index to investigate psychological phenomena. The diameter of the pupil reflects the 
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correlation between its dimension and a human’s well-being. The pupil can expand (this phenomenon is called 
mydriasis) or shrink (in this case it is defined as miosis). When the human body is under stress it induces mydriasis 
and hence pupil dilatation that can be measured through accurate tools that have also been used in the analysis of 
stress detection (Al Abdi et al., 2018; Gunawardhane et al., 2013). 


(8) Breathing Rate 

Another stress biomarker response is given by Breathing Rate. The respiratory pattern can be altered by stress and 
it can easily be measured with wearable devices(Can et al., 2019; Mozos et al., 2017) or specific tools (Al Abdi et 
al., 2018). If hyperventilation occurs (around 25/40 breathes per minute) the subject can be considered to be under 
stress. 


(9) Sensor data (accelerometer and gyroscope) 

Another index of stress can be indicated through data obtained by accelerometers and gyroscopes. The 
accelerometer sensor gives real-time information about motion and the related stress interpretation of data as 
shown in some research projects (Debard et al., 2020; Sağbaş et al., 2020). 


(10) Real-time Video-Facial Muscle Detection 

Video-Facial Muscle Detection demonstrates how a bespoke machine learning support vector machine (SVM) can 
be utilized to provide quick and reliable classification. Facial Muscle Detection Algorithm, machine learning and 
deep learning are today increasingly used for detecting stress (Healy et al., 2018; Zhang et al., 2020a). 


(11) Others 
Moreover, other studies focus on hand movements (Reanaree et al., 2016), tweeting content (Zhao et al., 2016) 
keyboard typing (Sağbaş et al., 2020; Vizer et al., 2009) and audio detection (Abburi et al., 2016). 


2.2 Identification of the main adopted techniques for stress detection to be implemented 
for evaluating immersive experience 

In this section the main techniques used for collecting stress detection’s parameters have been outlined and 

synthesised as schematically reported. As mentioned in the previous paragraph, several parameters could be 

analysed for stress detection purposes and along this line, the main adopted techniques to detect (sometimes even 

simultaneously) well-being variables have been analysed (Fig.2). 

Among others, it is worth mentioning: 


(1) Wearable devices 

Wearable devices are the tools most used to detect stress due to their versatility as well as their non-intrusiveness. 
Moreover, these kinds of instruments are accessible to all and for this reason they can be easily chosen for daily 
stress detection studies (Anusha et al., 2020; Debard et al., 2020; Delmastro et al., 2020; Mozos et al., 2017). In 
regard to physiological data collected, the majority presented on the market are able to detect HRV, EDA, BR, 
hand movement. Moreover, some smart wearable systems can collect ECG measurement such as Biopacs MP150, 
MP35 and Shimmer Sensing 3 (Can et al., 2019). Due to their non-invasiveness, they are sometimes employed 
without the user being aware of it. 


(2) Smartphones 

Among the unobtrusive devices for ten collections of physiological data, the common smartphone should be 
mentioned. Multiple features can be extracted from smartphones such as: accelerometer, audio classification, the 
time and duration of calls, light sensor data, gps information, screen mode changing frequency, videos, wi-fi 
conversations and so on (Gjoreski et al., 2015). The correlation between stress and collected smartphone data 
produces significant results. However, according to Can et al. (2019) the low classification of accuracy highlighted 
by Gjoreski, suggested the importance of adopting an integrated method with the support of the use of wearables 
and to not rely just on smartphones. 


(3) Machine learning 

Wearable devices as well as smartphones generate a massive amount of data to be processed, which sometimes 
necessarily requests the support of machine learning techniques, a branch of artificial intelligence. The issues of 
the big data generated, as well as their continuous flow demand algorithmic calculation for combining usage 
behaviours and collected data (Delmastro et al., 2020; Sağbaş et al., 2020). To provide a reliable categorisation, 
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bio-parameters such as Breathing Rate, Galvanic Skin Response and Heart Rate are usually collected and analysed 
by machine learning systems via the main common classifiers such as K-nearest neighbour (KNN) and support 
vector machine (SVN) (Ghaderi et al., 2015). Moreover, in the scientific literature, new models for machine 
learning have been created specifically for detecting emotions through human face recognition (Healy et al., 2018). 


(4) Neurosky headset 

Stress detection can also be analysed through an intrusive wearable device, namely the EEG Neurosky headset 
which is a tool to monitor and record the electrical activity of the brain via electrodes placed in the headset. 
According to Reanaree et al. (2016) the Neurosky Headset could also be complemented by an intelligent watch 
made by Arduino that has been used in his project (Reanaree et al., 2016). 


(5) Applied sensors 

A number of applied sensors for Galvanic Skin Response (GSK), Electrocardiogram (ECG), 
Electroencephalogram (ECC) are available on the market. Differently from wearable and smartphones, these 
applied sensors are invasive, and the user is conscious of being under observation without specifically knowing 
the reason why. Although these kinds of applied sensors are different from each other, they can collect multiple 
signals or one single bio-parameter. At the same time, they can be both easily be portable and/or not movable 
(Attallah, 2020; Kalimeri & Saitis, 2016; Minguillon et al., 2018; Pandey et al., 2016). 


(6) Images/video/ audio capturing tools 

Other fundamental tools to be considered other than smartphones are video, audio and image-capturing devices. 
Among them, especially used for reaching out to a large number of people rather than to an individual person, are 
video cameras and contact-free camera sensors. By guaranteeing a cost-effective system, these are the most 
frequently used tools to detect users! facial expressions (Abburi et al., 2016; Zhang et al., 2020b). 
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Fig. 2 Non-intrusive and intrusive techniques for evaluating immersive experience 


3. DISCUSSION 

In order to use them for future applications within the analysis of wellbeing in the digital immersive field, the 
present review summarizes the main technologies used in medicine. This investigation aims to disseminate tools 
and methodologies to make designers conscious of the actual impact of the digital environment in daily life and to 
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encourage planners to better design spaces by bearing in mind the impact of immersive environment on the users’ 
well-being. To address this, a holistic approach is required since the comprehension of the high potential of a 
interdisciplinary concept that moves from the medical field to design is crucial. Based on the lessons learned from 
the COVID-19 pandemic, the role of this integrated approach is to provide an informational method that could 
reduce the gap between architects, engineers, and designers in regard to the expressed or unexpressed responses 
of users in terms of digital impact of immersive environments. As such, this review investigates the stress bio- 
parameters to be adopted in immersive digital design to outline possible indicators of high-quality satisfaction for 
new digital environments, by using not only qualitative data such as interviews and self-reported evaluation, but 
also quantitative data given by body feedbacks that inform through their unconscious responses. In this paper, the 
authors have reported two macro areas, namely stress bio-parameters and the related tools, to be applied in design 
as possible well-being indicators. The most recurring stress parameters as well as the main tools embraced in other 
fields, have been reported to give a complete overview of the current usage from which digital immersive design 
could extrapolate the non-invasive techniques with which to ascertain the impact of design strategies on the user’s 
satisfaction. Our body is affected by the choice and design strategies adopted by virtual reality architect, engineers, 
and designers. There must be a scientifically recognised method to evaluate their implications for users’ well-being 
that goes beyond the traditional qualitative data susceptible to respondents’ bias. Choosing the most appropriate 
tool depends on the availability of resources, targets, and specific research purposes. Some advanced techniques 
such as video and picture analysis through machine learning, if properly used, could be beneficial for reaching out 
to the well-being analysis of a mass numbers of users. Therefore, the measurement of eye-pupil diameter and facial 
muscle movements are required for this type of analysis. It is different in the case of the analysis of a small number 
of users where artificial intelligence is not required. In this case, the analysis of Heart Rate Variability, 
Electrodermal Activity and Breathing Rate could be considered as the most informative techniques since, even an 
ordinary wearable can easily acquaint stress levels. However, adopting a transdisciplinary approach in which these 
technologies could be supplied to more deeply, investigation of the relationship between users and the built 
environment is advisable. Although it is widely recognized that human beings respond cognitively, emotionally 
and physiologically to the built environment, on the other hand, interdisciplinary studies of the physiological well- 
being connected to immersive environments seem to be lacking, underlining a gap that if investigated could be 
promising for construction. In this regard, an arising field called "neuroarchitecture/neuro design" is raising the 
question of how architecture can benefit from its intersection with neuroscience. However, so far, few scientific 
practical studies have been pursued. For this reason, this conceptual model based on the adoption of practical 
methodologies represents a central challenge of the present time and is expected to help the digital design 
researchers to integrate well-being medical analysis into the design evaluation process. Therefore, the authors 
attempt to improve this conceptual model in future case studies. 


4. CONCLUSION AND FUTURE DEVELOPMENTS 


The ongoing debate about the increasing digitisazion has raised people’s awareness of the impact of the new 
immersive technologies. Although there is ample evidence in the scientific literature on how the living and working 
environment impact the psychophysiological states of the users, and despite a recent and ever-growing awareness, 
only limited attention has been paid to systematic research to find a quantitative tool for the detection of the effect 
of the virtual space characteristics on the psycho-physiology and perception of the users. The lack of a quantitative 
approach to evaluate users’ impact should be filled by the ability to adapt medical technologies to compelling 
digital design requirements. Despite growing interest in research into crossing findings from different investigation 
fields, the application of medical results in digital environment field seems to be defective as demonstrated in the 
present literature. However, focusing on the trending results could help to explore promising eXtended Reality 
areas. Thus, by moving forward to the concept of virtual environment design, the questioning of digital spaces 
should be fundamental to tackle real human well-being intended as mental and emotional health, especially in 
these current times where the world “metaverse” is increasing advancing. Until today the use of new technologies 
has been focused on physical built environment rather than on eXtended Reality spaces. 

Advanced research and interventions are necessary to deeply investigate the relationship between users and 
immersive environments. Currently, as far as we know at the time of writing, the use of bio-parameters in digital 
design is limited to few experimental trials mainly related to marketing field and not yet validated and introduced 
in the design practice as a validation method for immersive quality analysis. Therefore, if these technologies could 
be implemented and correctly integrated in digital design, the role of neuro-design would be enhanced. Medicine 
has a huge potential to inform design, and the impact of medical findings could be applied in digital design. A 
holistic and interdisciplinary approach could be largely adopted by opening the door to the fruitful pollination of 
different research fields with a clear goal: to design high quality digital environment place for wellbeing and the 
high-quality experience of the users. Following this, the authors have identified the applications within eXtended 
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Reality where the well-being analysis by using stress detection can make the best impact: real-time responsive 
design based on a human-centric approach; users’ well-being monitoring in immersive environments; eXtended 
Reality “certification” based on human perception; education tools for designers and users to sensitize them to the 
impact of digital design on users’ wellbeing; critical evaluation of extended reality design; real-time interpretation 
systems which arrange immersive experience variables such as illumination, length of experience, quality of design 
and soon. This conceptual framework aims to help, during the design process to ensure the proper attention to 
users’ well-being. In short, there is a long road to travel, and much work needs to be done. This review is expected 
to help designers to rethink the impact of the digital environment in the light of the tangibility of objective and 
measurable data based on the well-being of users. The COVID-19 pandemic forces us to be aware of digital design 
implications. For an optimal design experience, our article directs a spotlight on the need for the adoption of 
medical techniques to evaluate the physical and mental users’ well-being of the growing and ever evolving use of 
eXtended Reality. 
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ABSTRACT: This study examines the capability of an immersive virtual environment (IVE—based) experimental 
protocol to support occupant thermal state (sensation, acceptability, and comfort) data collection when 
participants wear face masks. Specifically, the goal is to see if there is a change in local thermal states due to face 
covering and would such a change affect overall thermal states. A between-subject experiment was conducted with 
fifty-four participants (27 masked; 27 unmasked) who were exposed to three-step temperatures (18.3°C, 23.8°C, 
and 29.4°C) in a climate chamber under both cooling and heating sequences. In masked IVE experiments, 
participants donned a face mask and viewed the chamber's virtual model on a head-mounted display. In contrast, 
in unmasked IVE experiments, participants didn't use a face mask. Skin temperatures and overall/local thermal 
state responses were collected during the experiments. They were then statistically compared between masked IVE 
and unmasked IVE experiments. The results suggest that forehead temperature was significantly different under 
all step temperatures in the cooling sequence, with mean forehead temperature being larger in masked IVE than 
in unmasked IVE experiments. Furthermore, in masked IVE experiments, thermal sensation in the forehead, neck, 
and upper-back increased while the thermal acceptability in those same skin sites decreased, but this difference 
was not Statistically significant. Also, in masked IVE experiments, the overall thermal sensation increased, whereas 
both the overall thermal acceptability and comfort decreased when compared with unmasked IVE experiments. 
Nonetheless, this difference was not statistically significant. To summarize, wearing a face mask didn't affect the 
participant's overall and local thermal states in IVEs, although few statistical differences were observed in skin 
temperatures. 


KEYWORDS: Immersive virtual environment, thermal sensation, thermal comfort, thermal acceptability, face 
masks. 


1. INTRODUCTION 


Immersive virtual environments (I[VEs) are a technology that combines software and hardware systems to produce 
a virtual or simulated environment that arranges sensory input in a way that makes the user feel as though they are 
inside the virtual environment. With the help of this sensory input, the user becomes cognitively engaged and 
interacts with the elements of the virtual environment (Radianti et al., 2020). Head-mounted displays (HMD) are 
the most popular method to deliver IVEs because they are easy to set up and provide a wide field of stereoscopic 
vision. In a true 1:1 scaled setting, IVEs generally provide a favorable environment for sophisticated data collection 
methods, allowing researchers to effectively modify desired variables and test hypotheses at lower costs and 
shorter experimental times (Alamirah et al., 2022). As a result, [VEs are more frequently employed to research 
how occupant perception and satisfaction with the tested conditions are affected by changes in ambient conditions 
(such as lighting settings) (Heydarian et al., 2016). Specifically, IVEs are used to study the occupant's thermal 
states (thermal sensation, acceptability, and comfort) by incorporating thermal conditions through the use of closed 
environments like climate chambers (Rentala et al., 2021). Studies have also examined how well IVEs can simulate 
actual physical settings when evaluating users' comfort (Yeom et al., 2019). Other researchers used IVE to look at 
how people's perceptions of their indoor environment are influenced by psychological, physiological, and 
environmental factors (Chinazzo et al., 2020). 


The general IVE experimental protocol for studying occupant thermal states usually consists of subjecting the 
participants to a building design in IVE while simultaneously manipulating environmental conditions such as 
operative temperatures, humidity, etc., in a test environment (e.g., climate chamber) and collecting their 
physiological (e.g., skin temperature, heart rate, etc.) and thermal perception responses (e.g., thermal sensation, 
acceptability, and comfort) (Alamirah et al., 2022). Normally, during these experiments, the participants do not 
wear any face coverings or face masks. However, there are certain situations where the IVE experiments must be 
performed with the participants wearing a face covering or a mask for health and safety purposes, such as during 
the COVID-19 pandemic. This mask-wearing presents new challenges for occupant thermal state experiments 
using IVEs. Face mask usage affects the respiratory system. It specifically interferes with breathing normally, 
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causing some carbon dioxide to be expelled and some inhaled throughout each breathing cycle (Lazzarino et al., 
2020). Moreover, wearing a mask will directly lower the amount of oxygen inhaled by the body and prevent heat 
transmission between the facial region and the environment (Hu et al., 2022). Also, when wearing an HMD, the 
participant's face is already covered, which can trap heat and cause discomfort (Mehrfard et al., 2019). Adding a 
face mask on top of this can exacerbate the heat buildup, affecting a person's overall thermal state. As a result, it 
is reasonable to assume that using a face mask during IVE experiments may significantly impact how participants’ 
thermal sensation, acceptability, and comfort are evaluated, thereby impacting the validity of IVE experiments. 
Therefore, it is necessary to investigate the effect of using face masks on participants’ thermal states in IVE 
experiments. In this study, human subject experiments were performed in IVE under three-step temperatures in a 
climate chamber with participants who wore face masks (referred to as masked IVE experiments) and participants 
who did not wear face masks (referred as unmasked IVE experiments). Local skin temperatures and both the 
overall and local thermal state responses were collected. We hypothesize that the local skin temperatures, local 
thermal states and overall thermal states between the masked IVE and unmasked IVE experiments will differ 
significantly. The results of this analysis would provide new insights into developing IVE experimental protocols 
for occupant thermal state research particularly regarding the mask use. That is, whether to modify the 
experimental protocol to account for mask wearing or continue using the existing one when face masks needed to 
be used during the experiments. 


2. METHODOLOGY 
2.1 Participants 


This study was approved by the university's Institutional Review Board. A total of fifty-four participants were 
enlisted for this study. Half of the participants did the masked IVE experiments, and the other half did the unmasked 
IVE experiments. Participants in both experimental groups are divided roughly equally by gender and age, with 
15 men and 12 women participating in masked IVE experiments with a mean age of 21.9 years and 14 men and 
13 women participating in unmasked IVE experiments with a mean age of 22.3 years, respectively. This was done 
to ensure that gender and age would not affect the results when statistically comparing the two groups. 


2.2 Immersive Virtual Environment 


Both masked and unmasked IVE experiments were carried out inside a climate chamber that was located on the 
university's campus. An immersive virtual environment of the climate chamber was delivered via an HTC Vive 
head-mounted display device. The chamber's 3D model was produced with Autodesk 3ds Max. The model, 
together with the material textures and lightmaps, was loaded into Unreal Engine 4, as shown in Figure 1. The 
climate chamber offers space heating and cooling in an IVE experiment for measuring occupant thermal states. At 
the same time, the users observe the virtual world of the chamber interior via a head-mounted display (HMD). 


ae 


Fig. 1: Climate chamber's virtual environment 


101 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


2.3 Experiment Procedure 


Between-subject experiments were conducted using the procedure outlined in Figure 2. After the participants 
signed the consent forms, they were given a demographics survey and were asked to arrive at the chamber in a 
specific set of clothing (clo of 0.5-0.6), which included trousers and a T-shirt or a long-sleeve shirt. After coming 
to the chamber, the participants were tested for cigarette or alcohol use using the pre-experiment screening survey 
in the chamber's resting area, where the temperature and relative humidity were set to 75°F/23.8°C and 50% RH, 
respectively. Participants were excluded from the study if they were found to have used cigarettes or alcohol. The 
screening took about 10 minutes to complete and allowed the participants to adjust to the chamber's temperature 
and lessen the impact of their previous thermal state. Then, the participants were asked to enter the chamber's 
testing area and have the skin temperature sensors (Vernier surface temperature sensors; accuracy: + 0.5 °C, 
resolution: 0.1 °C) attached to their bodies at eight locations, i.e., forehead, neck, chest, upper back, forearm, hand, 
calf and foot. 


Consent Form and Demographics Survey! x i 


| i | 
| | | | 
Pre-Experiment Screening Survey! esi x | Record Continuously 
Skin Temperatures at 8 Local Skin Sites | | a a id 
Overall Thermal States | | i xl xi i 
l 
Thermal States at 8 Local Skin Sites | l | X] X] X] 


| | l | aan Testing Area 
Cooling Sequence (Masked IVE), 


| 


a | ls Ë 8) | 85°F/29.4°C | | 75°F/22 3.8°C | OS°E/18. 3°C 
Heating Sequence (Masked IVE)| E £ | | = D “ et 65°F /18.3°C| 75°F/23.8°C| 85°F/29.4°C | 
Cooling Sequence (Unmasked TVF)! Š Z i = = a S 185°F/29. acl 75°F/23. 8c] 65°F/18. secl 
Heating Sequence (Unmasked IVE fal | | 2 © (65°F /18.3°C| 75°F /23.8°C | 85°F /29.4°C 
Time 10 min 15 min 15 min 15 min 


Fig. 2: Experiment procedure 


Twenty-seven participants participated in masked IVE experiments, and another twenty-seven participants 
participated in unmasked IVE experiments. In masked IVE experiments, the participants wore a face mask that 
entirely covered the mouth and nose area and viewed the chamber's virtual model through the HTC Vive device, 
as shown in Fig. 3 (left). Whereas in unmasked IVE experiments, the participants only viewed the chamber's virtual 
model without wearing a face mask (Fig. 3 (right)). Also, both masked and unmasked IVE experiments consisted 
of cooling and heating sequences that were conducted at least two weeks apart. The cooling sequence had a 
decrease of three-step temperatures, i.e., 85°F/29.4°C —> 75°F/23.8°C —> 65°F/18.3°C and heating sequence had 
an increase of three-step temperatures, i.e., 65°F/18.3°C —> 75°F/23.8°C —> 85°F/29.4°C. So, overall there were 
four experimental sessions, i.e., (1) masked IVE in the cooling sequence, (2) masked IVE in the heating sequence, 
(3) unmasked IVE in the cooling sequence, (4) unmasked IVE in the heating sequence. The order of the four 
experimental sessions was random to reduce the order effect and was conducted with a set humidity of 55% RH 
and a CO): limit of 1,000 ppm. From the start of each trial until the end, the indoor control temperature around the 
participants (sensor placed at the height of 24 inches from the floor (ASHRAE, 2013)) and their skin temperatures 
were continuously recorded at one-second intervals. Following the stabilization of the indoor control temperature 
at each step temperature, the participants were subjected to that stabilized temperature for about 5 minutes, and 
then their overall and local thermal state votes were recorded. The thermal states included responses for thermal 
sensation, thermal acceptability, and thermal comfort. For local thermal states, only thermal sensation and thermal 
acceptability were recorded at the exact eight locations where the skin temperatures were sampled from. The 
ASHRAE Standard 55 Thermal Comfort seven-point scale was used to record overall and local thermal sensation 
(ASHRAE, 2013). In contrast, six-point scales were used to record the overall thermal comfort and overall/local 
thermal acceptability (Rentala et al., 2021). 
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Fig. 3: An experimental session with mask (left) vs. without mask (right). 


2.4 Data Processing 


After completing the experiments, the mean of the skin temperatures at eight sites and the indoor control 
temperature for each participant was calculated using the last five-minute data (i.e., data from when the control 
temperature stabilized to the end of the end thermal state surveys). This 5-minute averaged data was used for all 
statistical analyses. Furthermore, the mean indoor control temperature was statistically compared between masked 
IVE and unmasked IVE experiments under all step temperatures in both cooling and heating sequences to ensure 
that the indoor temperature was properly controlled and remained the same in both sets of experiments. A two- 
tailed independent sample T-test was used for comparisons. The tests revealed that the p-values in all the cases 
were not statistically significant (p > 0.05), indicating that the control temperature in the masked IVE experiments 
was comparable with unmasked IVE experiments. 


3. RESULTS 


Several statistical tests were performed to test the hypothesis that the local skin temperatures, local thermal states, 
and overall thermal states were significantly different between masked IVE and unmasked IVE experiments under 
all the step temperatures in both cooling and heating sequences. Independent sample T-tests were used to compare 
the skin temperatures collected at eight local sites. Wilcoxon Rank Sum tests were used to compare the local and 
overall thermal state responses. All statistical tests were performed at the significance threshold of 0.05. 


3.1 Skin Temperature 


Table 1 shows the results where the mean forehead temperature significantly differed (p < 0.05) between masked 
and unmasked IVE experiments under all step temperatures in the cooling sequence. Also, in the cooling sequence, 
the forehead temperature was higher in masked IVE than in unmasked IVE by an average of 0.5°C under all step 
temperatures. However, no significant differences in the forehead temperature were observed in the heating 
sequence under all the step temperatures, even though the forehead temperature under all step temperatures was 
higher in masked IVE than in unmasked IVE by an average of 1.06°C in the heating sequence. In addition, no 
significant differences were observed in skin temperatures at other sites (i.e., neck, chest, upper back, forearm, 
hand, calf, foot) between masked and unmasked IVE experiments under all step temperatures in both cooling and 
heating sequences. However, like forehead temperature, the neck, chest, upper back, forearm, hand, calf, and foot 
temperatures were higher in masked IVE than in unmasked IVE experiments by an average of 0.55°C, 0.57°C, 
0.49°C, 0.66°C, 0.52°C, 0.48°C, 0.69°C under all step temperatures in both sequences. 
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Table 1: Independent sample T-test results of skin temperatures 


Masked IVE Unmasked IVE 
Experiment Step 
Skin Sites Mean p 
sequences Temperature sD Mean (°C) SD 
CC) 

65 °F/18.3 °C Forehead 36.75 0.43 36.13 0.71 0.001 
Cooling 75 °F/23.8 °C Forehead 36.7 0.45 36.17 0.58 0.001 
85 °F/29.4 °C Forehead 36.49 0.41 35.99 0.49 0.001 


3.2 Overall Thermal States 


The mean overall thermal sensation votes were higher in masked IVE experiments than in unmasked IVE 
experiments by 0.44, 0.36, 0.4, 0.46, 0.27, and 0.38 (with an average of 0.38) in all step temperatures under both 
cooling and heating sequences (Fig. 4). This indicates that wearing a mask increases the overall sensation. Still, 
this increase was not statistically significant (p > 0.05) in all conditions. This result is corroborated by an earlier 
study in a climate chamber that did not use IVE (Yoshihara et al., 2021). On the other hand, in masked IVE 
experiments compared to unmasked IVE experiments, the mean overall thermal acceptability votes were lower by 
0.28, 0.96, 0.72, 0.26, 0.58, and 0.04 (with an average of 0.47) across all step temperatures during both cooling 
and heating sequences (Fig. 5). Similarly, the mean overall thermal comfort votes were lower by 0.08, 1, 0.44, 0, 
0.75, and 0.87 (with an average of 0.52) in masked IVE experiments when compared with unmasked IVE 
experiments across all step temperatures during both cooling and heating sequences (Fig. 6). These findings 
indicate that wearing a mask reduces the overall thermal acceptability and overall thermal comfort. However, 
similar to the results observed for overall thermal sensation, the decrease in overall thermal acceptability and 
thermal comfort was not statistically significant in all conditions. These results are consistent with an earlier study 
conducted in a non-IVE climate chamber (Zhang et al., 2021). 
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Fig. 4: Mean overall thermal sensation votes. 


104 


SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


2.0 


15 ¢ 
| ETE a 


Mean Overall Thermal 
Acceptability Votes 


0.0 
ott f 
-1.0 
LO LO O L O ë O O O O O O WO 
O onrnr wovnoeToo ~- - DW OO 
w u Gr wo wl 
2=2r2e222222222 
T ÄāåT TTU Ve SU Be VP BV UV p 
o o o o 8 Ob OB OB OB DB BD DW 
a æ 2 © OD © SC Oe Oe 2 Oe 
n fF HF Hh Hh WH Hw HH Hw DW WD WM 
o uuo UT Oe OOlCUlUCrOrTOCUCUOCOCUC TTC hH]HTUC«wWM 
SESzSEZSPESEZSEaSE 
2r>25 25 252525 
32323 2 PRB PRB 
Ooo GO o © oo Bw = wo = ss 
oO oO [e] w w w 
Oo Oo oO Fo m E 


Fig. 5: Mean overall thermal acceptability votes. 
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Fig. 6: Mean overall thermal comfort votes. 
3.3 Local Thermal States 


The thermal sensation and thermal acceptability at the eight local skin sites were also analyzed. The mean thermal 
sensation on the upper body, specifically at the forehead, neck, and upper back, increased by an average of 0.5, 
0.18, and 0.24, respectively, in masked IVE when compared with unmasked IVE under all step temperatures in 
both cooling and heating sequences (Fig. 7). This finding is partially supported by a prior non-IVE study where 
they reported higher mean thermal sensations at only forehead and upper back (Tang et al., 2022). Also, the reason 
for the thermal sensation increase is that mask use can affect the frequency of breathing, leading to heat buildup 
around the face and neck area, causing the participants to feel warmer (Zhang et al., 2021). On the contrary, the 
thermal acceptability at those same three skin sites decreased by an average of 0.62, 0.23, and 0.16, respectively, 
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sequences (Fig. 8). However, the increase in thermal sensation and decrease in thermal acceptability at those three 
skin sites were not statistically significant (p > 0.05) in all step temperatures under both cooling and heating 
sequence even though the forehead temperature was statistically significant in the cooling sequence (Table 1). 
Similarly, thermal sensation and thermal acceptability at other skin sites (chest, forearm, hand, calf, and foot) were 


in masked IVE when compared with unmasked IVE, under all step temperatures in both cooling and heating 
also not statistically significant (p > 0.05) between masked IVE and unmasked IVE experiments. 
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Fig. 7: (Left to right) Mean forehead, neck and upper back thermal sensation votes. 
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Fig. 8: (Left to right) Mean forehead, neck and upper back thermal acceptability votes. 
106 


4. CONCLUSIONS AND FUTURE STUDIES 


The study shows that wearing a face covering or mask while performing IVE experiments did not significantly 
affect the participants' overall thermal states as well as their local thermal sensation and acceptability at the eight 
skin sites. In other words, an appropriately designed experimental approach can support IVE experiments 
involving face masks. This approach should include precisely regulating the indoor test environment (e.g., 
temperature and humidity), offering an adequately designed virtual environment that induces high immersion with 
minimal motion sickness, and closely monitoring the participants throughout the experiment to make sure the 
masks are fitted correctly and do not interfere with the experimental apparatus (e.g., sensors). Even though the 
results were not statistically significant, small differences were observed in the mean votes, such as higher overall 
thermal sensation and lower overall thermal acceptability and comfort in masked [VE compared to unmasked IVE 
experiments. Higher thermal sensation and lower thermal acceptability were observed at the forehead, neck, and 
upper back in masked IVE compared to unmasked IVE experiments. Furthermore, higher temperatures were 
observed at all eight skin sites in masked IVE than in unmasked IVE under all step temperatures in both sequences. 
But these results were not statistically significant except at the forehead in the cooling sequence. Also, the increase 
in forehead temperature did not affect the forehead sensation, acceptability, or overall thermal states. While the 
results are noteworthy, they may be affected by some limitations. Firstly, the sample size within both the masked 
and unmasked groups was relatively small (n = 27). Therefore, the lack of statistical significance in the results 
could be attributed to the small sample size. Future investigations should aim to explore the impact of face masks 
on the validity of the IVE experimental protocol using a larger sample size. Secondly, this study only accounted 
for indoor air temperature and skin temperature, and future research may extend its scope to incorporate other 
environmental and physiological factors, such as the impact of relative humidity, air velocity, skin electrodermal 
activity, and heart rates. Finally, future studies may test the validity of the IVE experimental protocol involving 
mask use in different outdoor temperature conditions or seasons. 
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ABSTRACT: Augmented reality (AR) still struggles to be widely used in real processes in the construction industry 
despite its great potential. This is partly due to the difficulties that exist in aligning holograms and maintaining 
their stability, especially for outdoor applications. In addition, being indoor-outdoor interactions crucial for built 
environment management, it would be important that AR apps can work seamlessly. Alignment in indoor 
environments cannot make use of methods such as GNSS, nor can all environments be assumed to have been 
previously initialized with AR tools. Thus, marker-less AR registration is crucial for indoor applications. This 
paper presents an approach for marker-less AR registration seamlessly in both outdoor and indoor environments. 
Real-time kinematic positioning (RTK) and Inertial Measurement Units (IMU) technologies have been chosen for 
outdoor registration, while image comparison based on convolutional neural networks (CNN) for indoor 
registration. In this research, the application of these two technologies and their integration have been studied 
and tested on site on a real Facility Management use case related to a university campus. The proposed approach 
has shown very promising results in displaying BIM elements of the electrical system seamlessly superimposed 
through AR to their physical counterparts in mixed indoor-outdoor environments. 


KEYWORDS: Augmented Reality, Seamless Registration, Feature Matching, Pose Estimation, Real-Time 
Kinematic, Facility Management. 


1. INTRODUCTION 


The AECO industry, although commonly recognized as one of the least digitized industries, is increasingly moving 
towards embracing more and more computer-based technologies to provide better performance in various stages 
of buildings lifecycle (Albahbah et al., 2021). The Operation and Maintenance (O&M) phase of Facility 
Management (FM) accounts for the largest proportion of the whole life costs of the building process (Salman & 
Ahmad, 2023). The costs of the O&M phase represent 50-70% of the total annual facility operating costs and 85% 
of the entire lifecycle cost of a building. Ever since the facilities started to become more complex, the day-to-day 
tasks have also become more difficult. In fact, the increased need of the construction industry for visualization 
technologies arises from the complex nature of the industry and its high demand for information access for 
assessment, communication, and collaboration. Lack of coordination between facility managers and field workers 
results in delays and cost overruns which could easily be avoided with better coordination and visualization tools. 
In this domain, Augmented Reality (AR) technologies can be used as visualization tools for facilities’ O&M tasks 
and can provide significant advantages. AR impacts the mobile computing industry by radically changing the type 
of interaction between humans and computers. In fact, such technology creates direct, automatic, and practicable 
connections between the physical world and digital information by providing a simple and immediate user 
interface to a digitally enhanced physical world. Since AR allows virtual objects to be simultaneously 
superimposed on the real world, it helps to locate and view the occluded facilities and equipment and provides 
maintenance guiding instruction for the field workers (Salman & Ahmad, 2023). Through a hand-held device 
(HHD) or head mount device (HMD), it augments the real world by making implicit information apparent to the 
user when required. Many journal publications demonstrate that AR technology can be applied to various domains 
in AECO industry, especially in the O&M phase of the project lifecycle (Baek et al., 2019; Jurado et al., 2021; 
Naticchia et al., 2021; Vaccarini et al., 2022). For example, current maintenance practices are characterized by 
scattered and disoriented facility information that the maintenance staff must fetch through specifications, 
maintenance reports, and checklists. In fact, 50% of the on-site maintenance time is still spent on localizing and 
navigating targets inside a facility (Salman & Ahmad, 2023). Even after locating the target, maintenance staff must 
put additional effort into seeing the target as it could be concealed in the case of piping, overhead ducts or behind 
a wall. 


AR poses a number of demanding technological requirements for its implementation (Costanza et al., 2009). One 
challenge is related to display technology, which has registered remarkable breakthroughs in the last decades. 
Precise position tracking constitutes another significant challenge. In order to give the illusion that virtual objects 
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are located at fixed physical positions or attached to physical items, the system must know the position of relevant 
physical objects relative to the display system. Since the earliest days, spatial registration has been considered as 
one of the most important technical aspects of an AR systems and is considered a core part of AR functionality 
(Albahbah et al., 2021; Salman & Ahmad, 2023). Spatial registration can combine virtual objects and the real 
environment with the correct spatial perspective relationship by calculating the corresponding relation of both the 
virtual world and the real-world coordinate systems (Cheng et al., 2020). More in detail, spatial registration is 
responsible for calculating the user's correct spatial position and orientation in accordance with the real-world 
coordinate systems (Albahbah et al., 2021). 


The spatial registration methods are generally classified into two categories: “marker-based” and “marker-less” 
methods. The first one is considered the most widely used spatial registration method. Markers can be 2D images 
with visual features or natural 3D objects in the real environment (Cheng et al., 2020). The high use of this method 
may be returned to the simplicity, efficiency, and convenience of image recognition for superimposing virtual 
objects to the real world. Image recognition methods rely on extracting features from images instead of using 
complicated algorithms for calculating the relationship of relative positions. A similar approach can be applied 
with invisible markers, such as infrared and RFID ones (El Barhoumi et al., 2022). With marker-less approaches, 
instead of tracking features of markers, localization technologies are used to control the relative position between 
the real environment and virtual objects. GNSS is the most popular marker-less localization technology due to its 
suitability for use in a large open area such as a construction site and the ease of its signal receiving by common 
mobile devices (Cheng et al., 2020). According to the official US Government information on GNSS, the user 
range error (URE) for civil commitments cannot reach lower than 0.8 m (Cheng et al., 2020). The localization 
accuracy of GNSS is much worse in indoor environments, given that the buildings block the GNSS signal. The 
low accuracy of GNSS is not suitable for activities that require high accuracy or that mainly occur indoors. 
Compared with GNSS, some other marker-less localization technologies, such as Wi-Fi, Ultra-wideband (UWB), 
Inertial Measurement Unit (IMU), and Simultaneous Localization and Mapping (SLAM) can provide higher 
accuracy and can be applied to indoor activities. Another marker-less category is represented by vision-based 
methods using natural features for registration purposes (El Barhoumi et al., 2022). Each of these methods for 
spatial registration has its limitations in either accuracy or practicality. To promote the application of AR in the 
FM, which typically involves both indoor and outdoor environments, an advanced localization method that can 
provide an accurate and seamless registration in heterogeneous scenarios is needed. In order to cover this gap, a 
marker-less localization system for seamless indoor/outdoor AR registration has been developed by defining a 
cloud platform that hosts an indoor registration engine, an outdoor registration engine, plus a switch engine that 
manages the priority between the two. The developed system, tested on site on a real FM use case related to a 
university campus, has shown very promising results. The remainder of this paper is structured as follows. In 
Section 2, a literature review is presented. Section 3 reports the methodology adopted for the development of the 
proposed system. In Section 4, experiments design and execution on a FM use case are presented. Finally, Section 
5 is devoted to results discussion and conclusions. 


2. LITERATURE REVIEW 


In this section, a literature review concerning existing AR registration methodologies applied to both indoor 
(Section 2.1) and outdoor (Section 2.2) environments is reported. Understanding strength and eventual gaps of 
approaches proposed by past studies and commercially available solutions paved the way to the definition of the 
indoor/outdoor seamless registration system proposed by this study. 


2.1 Indoor AR registration 


In the AECO industry, several AR registration methodologies for indoor applications have been developed and 
tested so far. Past studies have exhaustively tested marker-based approaches using visual markers distinctive in the 
scene (Lee & Akin, 2011; Park et al., 2013). Even though artificial markers are advantageous in terms of robustness 
in detection, they should be installed all over the facility before on-site activities, such as the actual FM, occur. In 
addition, visual markers can trigger aesthetic issues because of their distinctive appearance. Alternative solutions 
are represented by invisible markers, such as infrared (Kuo et al., 2013) and RFID (Carbonari et al., 2022; 
Naticchia et al., 2021), and natural markers (Koch et al., 2014) that do not aesthetically change the scene. However, 
even though invisible markers do not have aesthetic issues, they should be pre-installed. Natural markers, instead, 
have the limitation of depending on signs, including exit signs, fire extinguisher signs, and textual information 
signs. If the scene does not have such designated signs, the localization can be restricted (Baek et al., 2019). 
Commercial AR libraries, such as Vuforia (PTC Products, 2023), ARcore (Google LLC, 2023), and World Locking 
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Tools (WLTs) (Microsoft, 2023) have been tested in indoor environments (Ashour et al., 2022; El Barhoumi et al., 
2022). Comparative tests in indoor environments of Vuforia Image Target and WLTs showed better results of the 
second ones (Teruggi & Fassi, 2022). 


Among marker-less methods, GNSS-based AR systems have been largely studied (Kim et al., 2013). However, 
they are considered inappropriate for indoor applications because of their low accuracy (Chen et al., 2019). 
Therefore, many studies have employed the Wi-Fi fingerprinting technology for indoor localization purposes 
(Ahmad et al., 2020; Chen et al., 2019). This approach loses accuracy in the case of multiple mobile devices. In 
fact, the localization accuracy of the order of 1 m, ensured by Wi-Fi-based collaborative systems (Chen et al., 
2019), can be still improved. Another marker-less methodology for indoor localization is based on image 
comparison. Image-based localization is classically tackled by estimating a camera pose from correspondences 
established between sparse local features (Ethan Rublee et al., 2011) and a 3D Structure-from-Motion (SfM) 
(Schönberger & Frahm, 2016) map of the scene (Li et al., 2012). Image comparison methodologies are classified 
into direct-matching and image-retrieval methodologies (Baek et al., 2019). Direct-matching methodologies do 
not render images for dataset, but directly find correspondences between 3D structure and the queried image 
(Humenberger et al., 2020). This pipeline scales to large scenes using image retrieval (Cao et al., 2020). Image- 
retrieval methodologies attempt to find the closest image to the queried image among the preliminarily prepared 
dataset. The dataset images can be preliminarily collected by photographs or rendered from three-dimensional 
structure estimation. Recently, many of these steps or even the end-to-end pipeline have been successfully learned 
with neural networks (DeTone et al., 2017; Lindenberger et al., 2023; Sarlin, Cadena, et al., 2018). This approach, 
although may lose accuracy whenever there is lack of context or repetitive elements, has shown great potential to 
develop indoor AR registration apps with applicability for non-expert users (Baek et al., 2019). 


2.2 Outdoor AR registration 


AR registration in outdoor environments presents distinct challenges compared to those encountered in indoor 
environments. Tracking and alignment approaches such as SLAM enable the placement of virtual objects relative 
to a local reference frame. However, the reliability of such tracking approaches in open environments is 
compromised due to (i) expanded spatial dimensions, (ii) the absence of readily available reference points, (iii) 
computational costs in large environments, and (iv) the dynamic nature of external spaces that undergoes frequent 
changes. Therefore, the AR registration approach in outdoor environments cannot solely rely on local reference 
systems but must necessarily be based on an absolute reference system, enabling the determination of the 
geographic pose of both the user and virtual objects (Cyrus et al., 2019; Ling et al., 2019; Marchand et al., 2016). 
To this end, hybrid AR registration approaches, based on the combined use of IMU and high-precision Global 
Navigation Satellite System (GNSS) such as the Real-Time Kinematics (RTK), have been pursued for the 
visualization of underground pipelines and subsurface data (Hansen et al., 2021; He et al., 2006; Roberts et al., 
2002), for urban navigation (Guarese & Maciel, 2019; Zhao et al., 2016), for agricultural vehicle navigation (Kaizu 
& Choi, 2012), and for the alignment of multiple smaller maps from an existing SLAM tracking system (Ling et 
al., 2019). Even from a commercial standpoint, there are currently not many solutions available that ensure the use 
of AR apps in outdoor environments either without relying on some kind of additional infrastructures (e.g., markers, 
QRcode, RFID, beacons, etc.) or without the need of manual/semi-manual alignment procedures, with some 
exceptions. For instance, Trimble Site Vision (Trimble Inc., 2023) makes use of the built-in GNSS receiver to 
achieve | centimeter of horizontal accuracy under RTK coverage. Similarly, Engineering-grade AR for AEC (vGIS 
Inc., 2023) developed by vGIS achieves the same centimeter-level accuracy under RTK coverage. In this case the 
RTK antenna is not directly integrated into the system but needs to be obtained from third-party vendors. However, 
relying on GNSS technology only means that the system cannot cope with urban-canyon scenarios and indoor 
environment. Due to these limitations and the persistently high costs, these solutions have not yet experienced 
widespread adoption in the construction industry. Delving deeper into this last notion, it is noteworthy that the 
individual components of RTK receivers are economically affordable, thereby fostering the proliferation of 
applications of this technology (Hansen et al., 2021). 


2.3 Research questions 


As demonstrated by the literature review reported in Sections 2.1 and 2.2, several AR registration methodologies 
exist. Limitations of existing indoor AR registration approaches must be considered. Marker-based approaches 
share the limitation of requiring a preliminary survey to install markers or the existence of signs to be used as 
natural markers (Baek et al., 2019). Among marker-less approaches, GNSS-based solutions are inappropriate for 
indoor applications because of the weakness of GNSS signals (Chen et al., 2019), whereas Wi-Fi-based approaches 
lose localization accuracy in case of multiple mobile devices (Salman & Ahmad, 2023). On the other hand, image- 
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based systems, although may lose accuracy whenever there is lack of context or repetitive elements, show great 
potential to develop indoor AR registration applications with applicability for non-expert users (Baek et al., 2019). 
With reference to outdoor AR registration approaches, limitations of marker-less GNSS-based approaches must be 
considered. First of all, the reduced reliability of GNSS in urban-canyon scenarios (e.g., proximity to urban 
elements, such as buildings, roofs, trees, and so on) limits possibilities of applications (Cheng et al., 2020; Ling et 
al., 2019). In addition, GNSS-based solutions are currently expensive (especially for high-precision RTK GNSS 
systems). Despite this, since single RTK receivers’ components are becoming available at affordable prices, the 
development of in-house devices is showing promising growth (Hansen et al., 2021). Finally, outdoor AR 
registration approaches are affected by the lack of integration to indoor scenarios, except through the use of 
additional supporting infrastructure (such as beacons) that constrain the deployment area (Cheng et al., 2020). 
Considering that indoor-outdoor interactions are crucial for managing the built environment, it would be important 
that AR applications can work seamlessly even during changes in environment. In addition, in order to ensure a 
wider applicability of AR, preliminary set up procedures for registration should be as simple as possible. In order 
to cover these gaps, this study aims to answer the following research questions: 


RQI What system architecture would ensure a seamless AR indoor/outdoor registration? 
RQ2 What technical solutions would make AR registration “plug-and-play” for wider applicability even 
among non-expert users? 


3. METHODOLOGY 
3.1 System architecture 


In order to answer the research questions (Section 2.3), the system architecture, reported in Fig. 1, has been defined. 
The proposed architecture is built on top of a BIM Cloud Platform which hosts the following four elements, each 
playing a crucial role in the overall functionality: (i) the Common Data Environment (CDE), (ii) the Outdoor AR 
Registration Engine, (iii) the Indoor AR Registration Engine, and (iv) the Switch Engine (Fig. 1). The BIM Cloud 
Platform serves as a centralized resource and processes hub that facilitates data processing, storage, and 
distribution. An important characteristic of the platform is its ability to host, localize, and align BIM models, 
images, and point clouds within a geospatial context. This geolocation feature enables the precise mapping of 
virtual assets and features to their corresponding real-world locations, facilitating the integration between the 
virtual and physical realms. One of the key responsibilities of the BIM Cloud Platform is therefore to manage the 
alignment processes. Particularly, the positioning of images within the platform is achieved by referencing the 
absolute world coordinates of the acquisition point, along with the accurate rotations. This process ensures that 
images (and point clouds) are precisely georeferenced and aligned to their real-world locations. This approach lays 
the foundation for understanding the subsequent paragraphs, which delve into the concept of “images in the vicinity 
of the user's position”. 


The CDE is responsible for structured (e.g., .ifc files) and unstructured (e.g., images) data storage. It facilitates 
accessibility to AR applications through dedicated clients. In this work, the CDE of the DICEA Department of the 
Universita Politecnica delle Marche has been used. At its core, a graph database provides a resilient backbone, 
offering efficient storage, retrieval, and traversal of interconnected data elements. The next integral components 
are the two distinct registration engines, specialized for outdoor and indoor environments, respectively. The 
Outdoor AR Registration Engine, which relies on the combination of RTK GNSS and IMU systems, is tailored to 
tackle the unique challenges presented in open spaces, such as dealing with the absence of reliable reference points 
and coping with large and dynamic environments. On the other hand, the Indoor AR Registration Engine is 
designed to excel in environments characterized by restricted access to GNSS signals, leveraging features like 
point clouds and aligned images to achieve accurate positioning. To this purpose, convolutional neural networks 
(CNN) (Sarlin, Cadena, et al., 2018; Sarlin, Debraine, et al., 2018) that simultaneously predict local features and 
global descriptors have been applied for accurate 6-DoF localization. Finally, the pivotal feature of this system lies 
in the Switch Engine, which effectively serves as an integrator between the outdoor and indoor registration engines. 
It is a rule-based engine that assesses the availability of either GNSS signals, or features, or both, and dynamically 
switches between the two registration approaches to maintain a consistent and uninterrupted AR experience. By 
synergistically combining the aforementioned four elements within the BIM Cloud Platform, the system 
architecture (Fig. 1) delivers a robust and marker-less AR system that can seamlessly adapt to both indoor and 
outdoor scenarios (i.e., answer to RQ1). Although the approach and methodology proposed in this paper are 
applicable to both head-mounted and hand-held AR devices, the focus from this point forward will be specifically 
on the usage of Microsoft’s AR tool, HoloLens2. To this end, a novel addition to the tool is introduced, developed, 
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and physically realized, enabling a robust connection between the HoloLens2 device and the RTK receiver for 
precise calibration between the systems. The presented system architecture has been implemented in an AR 
application for HoloLens2 developed using the C# programming language and the serious game engine Unity3D. 


BIM Cloud Platform 


Point Clouds and 
referenced Images 


BIM models 


Outdoor AR Registration Engine Indoor AR Registration Engine 
ae 


Switch Engine 
D 


HF-Net + PnP 


E) 
y 
World coordinates (oen). $ 


Fig. 1: Architecture of the proposed system for seamless outdoor and indoor AR registration. 


World coordinates 


3.1.1 Outdoor AR Registration Engine 


Following the works made by Hansen et al. (2021) and Ling et al. (2019), the Outdoor AR Registration Engine 
proposed in this paper relies on the combination of a RTK GNSS tracking system and the HoloLens?’ built-in 
inertial tracking system. By aligning the local frame reference of the HoloLens2 device to global coordinates 
exploiting the RTK measurements and using the HoloLens2 capability to localize itself in the environment through 
a real time mapping service, a geographical SLAM algorithm can be developed in order to have an absolute 6- 
DoF localization of the AR device and therefore aligned virtual objects. Some considerations must be made: 


e a general 3D object in the BIM Cloud Platform has its own reference frame located in the world that is 
fully specified by its geographical coordinates and its orientation with respect to the North: Latitude (°), 
Longitude (°), Altitude (m), and Azimuth (°). Object’s coordinates refer to the WGS-84 standard; 

e the coordinates retrieved from the RTK system are based on the WGS-84 standard; 

e the RTK receiver has been developed in-house with contained costs in the perspective of widespread 
adoption; 

e the RTK receiver and the HoloLens2 must be solidly connected to each other; 

e the RTK system is reliable as long as the receiver is within 10 km of the RTK base station antenna; 

e the HoloLens?’ local coordinate frame originates at the point where the AR application is turned on. 


The problem of placing world-referenced 3D BIM objects into the HoloLens2 local frame can be achieved by 
fulfilling the following steps (Fig. 2): (i) aligning the local frame with the North direction, (ii) adjusting the object 
position, (iii) adjusting the object altitude, and (iv) placing objects based on the distance from the observer. The 
resulting Outdoor AR Registration Engine’s process (Fig. 2) is automatically executed (i.e., answer to RQ2). Hence, 
the outdoor registration process, not requiring any particular action from the user, can find applicability even 
among non-expert users. 


GET SAMPLES COMPUTE AZIMUTH ADJUST POSITION 
from: Compute azimuth to align AND ALTITUDE 


. RTK GNSS the local frame to the Update objects position 
. IMU North direction 


Fig. 2: A schematization of the outdoor AR registration engine’s processes. 


The first part of the outdoor engine involves the initialization phase of the HoloLens2 position. It includes 
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acquiring initial samples from both the RTK and IMU systems. Once initialized, the RTK system provides absolute 
3D coordinates of the body frame (Latitude 1, Longitude A, and Altitude h,), while the IMU provides for local 
3D coordinates (x, y, z) and rotations of the body frame. The local equirectangular projection must have y axis 
directed toward the North. However, at the beginning, the HoloLens2’ local frame has an arbitrary unknown 
orientation with respect to the North. To solve this, when moving between two positions, the performed segment 
has an orientation p’ with respect to the y’ axis but the same movement forms a bearing angle p} with respect to 
North as shown in Fig. 3. 


Fig. 3: Bearing angles with respect to North direction. 


Given latitude and longitude of the start point (91, 41) and latitude and longitude for the end point @2, A, of a 
straight line along a great-circle area, the initial bearing (sometimes referred to as forward azimuth) can be 
computed as follow: 


in(Az—A1)cos(~2) 
= atan2 (re vente) ____) 1 
P cos(p1)sin(p2)-sin(p1)cos(p2)cos(A2—21) (4) 


The geographical position of its reference frame (Po, 9) can be computed by inverting the reverse projection: 
Ap = 4' - — 


(2) 


Po = 9' -7 (3) 


Rcos@y 


assuming that pọ = Qı and where (¢~',/') are the GNSS coordinates and R is the radius of the globe. When the 
local frame position (90, o) and the object’s geographic coordinates (p”, A”) are known, the corresponding local 
planar coordinates can be computed by the forward projection: 


x = RA" —A,)cosp, (4) 
y = R(Q" — po) (5) 


When the observer moves too far from the origin, the local reference should be translated in the new position and 
the 3D objects must be positioned with respect to the new reference system by re-applying the previous equations. 
To track the true value of the observer altitude, each time a GNSS measure is acquired for the observer’s altitude 
h’, its height in the local coordinate system can be stored for later use as height of the origin of the local frame. 
Consequently, given the altitude of the local frame (hg), its height in local coordinates Zp) and the altitude of an 
object (h”), the corresponding vertical coordinate z of an object can be computed by: 


z = h” — ho + Zo (6) 


If the observer vertically moves the objects at z’ to match their true height from the ground, the resulting true 
altitude is computed and stored. When the observer moves too far from the origin, the local reference should be 
vertically translated in the new altitude and the 3D objects must be positioned with respect to the new reference 
system by re-applying the previous equations. 


3.1.2 Indoor AR Registration Engine 


The AR registration in indoor environments, which contrarily to the outdoor ones are affected by restricted GNSS 
signals, required a dedicated solution different from the one presented in Section 3.1.1. For this reason, an Indoor 
AR Registration Engine, based on image comparison with survey data of the analyzed environment, has been 
developed (Fig. 4). A preliminary on-site survey with a camera and LIDAR scanner (e.g., GeoSLAM ZEB 
Horizon) must be carried out in order to collect point cloud and aligned photos of the analyzed environment. 
Alternatively, point clouds can be generated from a photos collection using the incremental Structure-from-Motion 
methodology implemented by the COLMAP library (Schönberger & Frahm, 2016). The basic idea is 6-DoF 
localizing the HoloLens2 by comparing a frame from its current view (i.e., query image) with images referenced 


114 


to the point cloud of the analyzed environment (i.e., reference images). Both reference images and the point cloud 
are stored in the BIM Cloud Platform. In this study, the Hierarchical Feature Network (HF-Net) technology has 
been implemented for image comparison (Sarlin, Cadena, et al., 2018; Sarlin, Debraine, et al., 2018). HF-Net 
consists of a CNN able to simultaneously detect feature keypoints and compute local and global descriptors for 
accurate 6-DoF localization. The “hierarchical” attribute refers to the HF-Net feature close to the humans’ attitude 
of naturally localizing, in a previously visited environment, with a “from coarse to fine” approach. In other words, 
humans first localize themselves by looking at the global scene appearance and subsequently inferring an accurate 
location from a set of likely places using local visual clues. This means that for each HoloLens?2 registration call, 
a coarse search, consisting in a global-descriptors matching between the query image and the reference images, is 
performed. Afterwards, a finer search based on a local-descriptors matching between the query image’s 2D 
keypoints and the point cloud’s 3D points covisibile in reference images is executed. Finally, a 6 DoF pose 
estimation of the query image is carried out by solving the Perspective-n-Point (PnP) problem (Kneip et al., 2011). 
The estimated pose is thus the rotation and the translation vectors that allow transforming 3D points expressed in 
the world coordinate system into the camera coordinate system. These parameters enable the indoor AR 
registration of HoloLens2’s gaze. 


The presented Indoor AR Registration Engine’s process (Fig. 4), once the survey of the interested area is completed, 
can be automatically executed (1.e., answer to RQ2). Hence, the indoor registration process, not requiring the user 
to do any particular actions, can find applicability even among non-expert users. 


COMPARE IMAGES ESTIMATE POSE 
(HF-NET) (PNP) 


SURVEY THE AREA 


° Point cloud (directly or 
indirectly by SfM) ¢  Global-descriptor matching Estimate position and heading 
. Referenced photos ° Local-descriptor matching for indoor AR registration 


Fig. 4: A schematization of the Indoor AR Registration Engine’s processes. 
3.1.3 The Switch Engine 


As illustrated in Fig. 1, the Switch Engine serves as a seamless integrator between the two types of registration, 
contributing to answer both RQ1 and RQ2. This component is a rule-based engine that enables seamless 
indoor/autdoor AR registration. The Switch Engine acts differently according to 3 possible scenarios: (i) RTK only 
(outdoor), (ii) RTK plus images/point clouds (outdoor), and (iii) images/point clouds only (indoor). In the first 
scenario, the Switch Engine identifies the availability of a stable and reliable RTK connection in outdoor scenarios. 
Consequently, the Switch Engine triggers the outdoor AR Registration Engine. The second scenario occurs when, 
in outdoor environments, images and point clouds are available simultaneously with RTK connection. The third 
scenario, instead, refers to indoor scenarios in which only images and point clouds are available. In both the second 
and third scenarios, as the Switch Engine identifies the presence of images and point clouds in the vicinity of the 
user’s real-world position (e.g., when approaching a previously surveyed building or asset), the system triggers 
the Indoor AR Registration Engine. At that point, the system entirely relies on images and point clouds for AR 
registration. 


4. EXPERIMENTS 
4.1 FM use case 


The methodology proposed in this study has been tested on a FM use case based on a university campus, assumed 
as case study. Specifically, the study focused on the FM of the Digital Construction Capability Centre (DC3) Lab 
at the Universita Politecnica delle Marche (Fig. 5 (a)). The DC3 Lab, which covers an area as large as 240 m?, is 
composed of a main open space, a changing room, an office, and the restroom. Within this context, the management 
of the electrical system, and in particular of the internal electrical panel of the DC3 Lab, has been considered. 
During this activity, the technician in charge of FM operations spends time first locating the electrical panel. Then, 
in order to find the root cause of the problem, the technician may be asked to locate the panel’s associated cabling, 
which extends externally to the building. These cables can be accessed through manholes located on the road in 
front of the building (Fig. 5 (a)). Once located all the elements interested by FM operations, the technician may 
need technical information about the electrical system. To this purpose, he/she needs to access the as-built BIM 
model (Fig. 5 (b)). The implementation of the proposed methodology (Section 3) enables the seamless AR 
registration in heterogeneous indoor/outdoor scenarios. Testing the proposed system to the presented use case, its 
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applicability in real-life situations will be assessed. As described more in detail in Section 4.2, the experimentation 
primarily revolves around the utilization of an AR application implemented on the HoloLens2 device. 


(b) 

Fig. 5: (a) Aerial view of the university campus, assumed as case study, identifying the positions of the DC3 Lab 
(i.e., red placemark), the manhole cover (i.e., blue placemark), and the RTK base antenna (i.e., green placemark); 
(b) view of the BIM model of the DC3 Lab identifying the positions of the indoor electrical panel (i.e., red 
placemark) and the outdoor manhole cover (i.e., blue placemark). 


4.2 Experiments design and execution 


The developed system has been tested on the selected use case (Section 4.1) following the steps summarized in 
this section. First of all, a preliminary set-up phase consisting in collecting and initializing input data has been 
executed. This phase must be executed only once since related input and settings are maintained. It includes: 


1. collecting point clouds and aligned photos by carrying out a survey of the DC3 Lab. In this study, the survey 
has been carried out by using GeoSLAM Backpack Vision (Fig. 6 (a)), which collects simultaneously both 
point clouds and aligned photos with a single scanning (Fig. 6 (b)). Alternatively, the survey can be carried 
out by collecting photos (e.g., with a smartphone) and then generating a point cloud through the Structure- 
from-Motion methodology implemented by the COLMAP library (Schönberger & Frahm, 2016); 


2. collecting the BIM model of the DC3 Lab (Fig. 5 (b)); 


3. uploading the BIM model, point cloud, and images, related to the selected use case, on the BIM cloud platform. 
It must be noted that point cloud and images result aligned directly from the survey. The BIM model must be 
aligned to the previous dataset by selected reference points; it must be noted that this alignment is executed 
only once since it is maintained. 


Once the previous preliminary steps are completed, the AR-based inspection of the electrical system related to the 
selected use case can start. In this study, the head-mounted AR device Microsoft HoloLens2 has been used (Fig. 6 
(c)). The AR application, based on the system architecture reported in Fig. 1, has been developed to support the 
technician inspecting the electrical system distributed in a heterogeneous indoor/outdoor scenario. The main 
contribution of the proposed system is the marker-less AR registration for displaying BIM models seamlessly 
superimposed to the whole inspected environment. The following steps have been executed on-site: 


4. having on the HoloLens2 with installed RTK receiver (Fig. 6 (c)) and launching the AR application; 

5. moving around the campus of the Faculty of Engineering at Universita Politecnica delle Marche. During this 
preliminary step outside the DC3 Lab, the Outdoor AR Registration Engine is triggered in order to localize 
the user and drive him/her to the DC3 Lab; 

6. heading to the internal electrical infrastructure to inspect the electrical panel located inside the DC3 Lab (Fig. 
5 (a)). As the user moves from the outdoor to the indoor, the GNSS coverage decreases and the collected 
dataset (i.e., point clouds and aligned images) is found in the surrounding of the user position. Hence, the 
system seamlessly switches to the Indoor AR Registration Engine. This transition occurs without interruption 
as the system switches between registration modes through the Switch Engine’s algorithm; 

7. inspecting the internal electrical infrastructure, specifically focusing on the indoor electrical panel. During the 
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indoor phase, the AR application relies on the Indoor AR Registration Engine. It superimposes the digital 
model of the electrical panel on the real asset to let the facility managers have all the required information 
from the BIM model for the inspection (Fig. 7 (a)); 

8. heading to the external electrical infrastructure to inspect the cablings associated to the internal electrical panel. 
As the user moves from the indoor to the outdoor, the GNSS coverage rises again and the system seamlessly 
switches to the Outdoor AR Registration Engine. This transition occurs without interruption as the system 
switches between registration modes through the Switch Engine’s algorithm; 

9. inspecting the external electrical infrastructure, specifically focusing on the manhole covers located on the 
street facing the building (Fig. 5 (b)). During the outdoor phase, the AR application relies on the Outdoor AR 
Registration Engine, leveraging geospatial data retrieved from the RTK receiver to overlay BIM data on the 
real asset (Fig. 7 (b)). 


(a) | (b) | | (0) 
Fig. 6: (a) Survey phase of the DC3 Lab by using GeoSLAM Backpack Vision for collecting (b) point clouds 
and aligned photos; (c) inspection phase with the Microsoft HoloLens2 and the RTK receiver integrated by a 3D 
printed add-on. 


| EE] a ee SL AIR 
(a) (b) 
Fig. 7: Visualization of the aligned holograms of (a) the indoor electrical panel and (b) the outdoor manhole 
cover through the AR application deployed on Hololens2. 


5. DISCUSSION AND CONCLUSIONS 


This paper addresses the open issue of AR registration in mixed scenarios considering that indoor-outdoor 
interactions are crucial for built environment management. An extended literature review has found limitations of 
existing indoor and outdoor AR registration approaches, drawing the conclusion that an all-in-one solution 
conceived for heterogeneous scenarios do not exist yet. Hence, this work focuses on defining and implementing a 
system architecture for seamless AR registration even during changes in environment (i.e., RQ1). In doing this, 
technical solutions that would make AR registration applicable even among non-expert users have been considered 
(i.e., RQ2). In order to answer RQ1, a system architecture (Section 3.1), which delivers a robust and marker-less 
AR registration system for both indoor and outdoor scenarios, has been defined and implemented. The resulting 
system has been tested on-site on a FM use case (Section 4.1). The proposed marker-less localization system has 
been put in place considering the FM of a university laboratory’s electrical system. It has been selected because 
in-charge technicians and facility managers are continuously asked to locate, inspect, and repair interrelated system 
elements distributed in a heterogenous indoor/outdoor environment. Efficiency of such activities is expected to be 
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considerably improved by accessing BIM models directly superimposed to their physical counterparts. In this 
study, the head-mounted AR device Microsoft HoloLens2 has been used for 6DoF localization and seamless gaze 
registration in heterogeneous environments. Indoor inspections of the electrical system have specifically focused 
on the laboratory electrical panel, whereas the outdoor ones on the related cablings that can be inspected through 
a dedicated manhole cover. The proposed system has shown promising results in registering the BIM model on- 
site. In fact, the holograms of the electrical panel and the manhole cover resulted superimposed to their physical 
counterpart even if they are located respectively in indoor and outdoor environments. This has confirmed one of 
the main contributions of the proposed system, that is a marker-less AR registration for displaying BIM models 
seamlessly superimposed to the whole inspected environment. In order to ensure a user-friendly AR registration 
process in mixed environments, hence providing an answer to RQ2, the proposed system requires only a 
preliminary set-up phase (i.e., steps 1-3 in Section 4.2) whose settings are then maintained. In fact, as confirmed 
by experiments, the AR experience is supported by the Switch Engine that manages priorities of the indoor and 
outdoor AR registration engines. In areas with GNSS coverage (i.e., generally outdoor environment), the Switch 
Engine delegates holograms superimpositions to the Outdoor AR Registration Engine. This is the logic that 
regulates the superimposition of the external manhole cover hologram to its physical counterpart. As the GNSS 
coverage is lost and collected datasets (i.e., point clouds and aligned images) are found in the surrounding of the 
user position (i.e., generally indoor environments), the Switch Engine delegates holograms superimpositions to 
the Indoor AR Registration Engine. This is the logic behind the superimposition of the internal electrical panel. As 
a result, since AR registration is automatically fulfilled in heterogeneous environments, the proposed solution can 
find applicability even among non-expert users. 


Current limitations of the proposed methodology can be traced back to technologies adopted by system’s engines. 
Accuracy loss may affect image comparison, adopted by the Indoor AR Registration Engine, in case of reference 
and/or query images with lack of context or repetitive elements. Follow-up studies may quantify such limitation. 
On the other hand, the Outdoor AR Registration Engine strongly relies on the availability of RTK coverage. 
Despite such technology is currently economically expensive, it is noteworthy that since the individual components 
of RTK systems are economically affordable, the proliferation of applications of this technology is highly expected. 
Further studies will be carried out in order to assess registration accuracy of both the Indoor and Outdoor AR 
Registration Engine. More tests must be carried out in order to optimize switching thresholds based on GNSS 
coverage and datasets (i.e., point clouds and aligned images) availability in the surrounding of the user position. 
Future developments will focus also on the definition of a graphical user interface for better managing the entire 
AR registration workflow. Finally, the proposed system will be provided to non-expert users in order to quantify 
its contribution in terms of saved time for completing a task. 
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ABSTRACT: Due to challenges in filling vacant positions and the heightened demands posed on existing staff, 
employers and project managers are progressively considering the recruitment of inexperienced individuals and 
seeking strategies to swiftly provide them with essential job-specific knowledge. The potential of industrial AR 
has been widely researched to support workers in overcoming skill-related knowledge and enhancing industrial 
processes. However, most studies focus on demonstrating technology usability across different processes and 
overcoming engineering hurdles on a case-by-case basis. There is no direct benefit analysis on how AR assists 
construction tasks at human motion level, and how to eliminate the ineffective motions and reduce the duration of 
effective motions. To fill this gap, this paper first establishes an AR-based near real-time object detection system 
of small tools and components involved in task processes for egocentric perception of workers in the construction 
industry. Later, the Standard Operating Procedure (SOP) for scaffolding assembly activities is deconstructed from 
a manual process into Therbligs-based elemental motions. Finally, this research conducted a comparative study of 
two prototypes across four dimensions of evaluation. As a step forward in this direction, this paper renews the 
connotations of Therbligs theory under industry 5.0 era, rethinks the AR-assisted construction task processes, and 
applies appropriate technologies enhancing the adaptability of AR technology for construction workers’ needs. 


KEYWORDS: Augmented Reality (AR); Microsoft HoloLens 2; Object Detection; Task Assistance; Therbligs 


1. INTRODUCTION 


The construction industry, characterized as one of the least digitized and toughest labor challenges, urgently seeks 
solutions for industry transformation (Liao, Iseley, & Behbahani, 2022). In Canada, nearly half of the construction 
employers are facing the obstacle of recruiting skilled employees over the next three months (Government of 
Canada, 2022). Meanwhile, increased project complexity implies a high degree of technical, organizational and 
environmental variability and uncertainty, which leads to greater risk and poorer performance by construction 
personnel (Pefialoza, Saurin, & Formoso, 2020). In terms of technical requirements alone, project contributors 
must possess a more reliable and advanced technical qualification (Trinh & Feng, 2020). From the organizational 
and demographic characteristics, most construction workers rather rely heavily on previous work experience or 
oral interpretation from peers than follow the correct mechanical operation steps or standardized operation 
procedure (Ke, 2018). Oyekan et al., propose one of the mental challenges faced by workers, highlighting that as 
task operations become more complex, the cognitive load on operators also intensifies. This heightened cognitive 
load can be reflected in various human mental reactions of “Search,” “Find,” “Select,” and other Therbligs. 
Consequently, due to the greater difficulty in filling vacant positions and the increased demands placed on 
employed personnel by the complexity of construction projects, employers and project managers are gradually 
hiring inexperienced newcomers and are eager to find countermeasures to equip them with the necessary job- 
related knowledge in a short time (Bittner, Prilla, & Rocker, 2020). 


Industrial AR, deploying Augmented Reality (AR) technology into dynamic industrial environments targeting 
inhomogeneous user groups, is seen as potential solution to above dilemma and gains more momentum in both 
academic and industry (Grubert et al., 2010). The potential of Industrial AR has been widely researched to support 
workers in industrial scenarios in overcoming skill-related knowledge and enhancing industrial processes (de 
Souza Cardoso, Mariano, & Zorzal, 2020). In the context of industrial AR, one of the more fruitful and practical 
projects is the AR-Driven Task Assistance system, which supports workers by providing real-time sequence of 
assembly operations, tools to be used and collision free assembly paths at the workplace (Eswaran & 
Bahubalendruni, 2022). Previous research and the authors of this paper have also contributed to the development 
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of a similar task-assisted AR prototype, which in this paper is expected to focus on the second function for 
highlighting recommended tools in the user’s view while performing a task. Oyekan et al. addressed the mentioned 
challenge by using Therbligs to embed intelligence in workpieces and make them interactable and communicable. 
They developed smart workpieces to actively participate in assembly operations by providing their location and 
operational sequence to an operator (Oyekan et al., 2020). However, their solution may encounter various practical 
issues, such as overloaded servers when many workpieces update their status simultaneously, mispositioning of 
sensors and LEDs leading to electronic faults, and greater weight and more challenging manipulation of 
workpieces due to the addition of extra components. 


It is obvious that CV and AR are theoretically linked and mutual fulfillment of each other, as OD could quickly 
identify and localize specific objects and draw bounding boxes around instances, and AR greatly extends users’ 
capability and experiences by directly presenting detected objects and their digital data in an immersive, interactive 
way (Z. Wu, Zhao, & Nguyen, 2020). Thus, this paper proposes another computer vision-based solution to the 
same challenge by integrating OD into an AR-driven task assistance prototype. The choice of deployed 
technologies is highly related to their ability and referred to prior successful cases. While researchers have 
established the effectiveness of Industrial AR and its widespread adoption in construction task assistance over the 
past two decades, most studies focus on demonstrating technology usability across different processes and 
overcoming engineering hurdles on a case-by-case basis (Kim, Olsen, & Renfroe, 2022; S. Wu, Hou, Zhang, & 
Chen, 2022). However, user-related assessment of AR assistance systems and worker-oriented effectiveness in 
industrial environments is not a major focus (Tao, Lai, Leu, Yin, & Qin, 2019). To be more specific, there is no 
direct benefit analysis of how AR assists construction tasks at the human motion level and how to eliminate 
ineffective motions and reduce the duration of effective motions. 


To fill this gap, this research first reports the further exploration of embedded object detection into the existing 
AR-driven task assistance prototype developed by authors. The existing AR prototype is targeted at construction 
workers without any previous work experience to conduct tasks from the beginning. But it only provides fixed 
information designed in advance about the activities and corresponding contents step by step. More advanced, the 
prototype developed in this research realizes a real-time detection of multiple scaffolding components, 
superimposes holographic texts, and gives hints about the correct selections which helps new industry entrants 
make the right choices from a wide range of tools and components. Later, the Standard Operating Procedure (SOP) 
of scaffolding assembly activity is decomposed from a human manual process into Therbligs-based elemental 
motions. It serves as both a specific example to enhance the understanding of Therbligs-based task processes and 
the foundation of subsequent benefit analysis. To present a more intuitive and clear effect, this research finally 
adopted a comparative study of a traditional AR prototype and an advanced AR prototype with object detection 
function from four dimensions of evaluation. It will demonstrate the superiority of the proposed prototype in easing 
cognitive load, eliciting contextual awareness, and reducing particular motion costs on Search, Select, and Find. 


The proposed pathway not only explores the possibility of fully exploiting the advantages of both Augmented 
Reality and Object Detection, but also allows novice workers to easily perform high requirements tasks with a 
satisfied completion accuracy. As a step forward in this direction, this paper renews the connotations of Therbligs 
theory under industry 5.0 era, rethinks the AR-assisted construction task processes, and applies appropriate 
technologies enhancing the adaptability of AR technology for construction workers’ needs. It is expected this 
research could inspire substantial discussions, enhance the implementation of AR-driven task assistance, and 
provide a valuable reference for construction workforce preparation. 


2. RELATED WORK 
2.1 Therbligs overview 


Therbligs, first invented by Frank Gilbreth during the early 20th century, is a collection of 18 elemental human 
mental and physical motions used to describe any task and analyze the motion economy in the workplace (Sung, 
Ritchie, Lim, & Medellin, 2009). The full collection of 18 Therbligs and their symbols used for depicting when 
performing work is shown in Table 1. It is useful to use Therbligs to analyze the impact of technology adoption on 
individual earnings (Wang et al., 2021). The overall efficiency and productivity of tasks will be significantly 
improved because less time wasted on non-value-added activities and more time spent on productive work. The 
selection of Therbligs to be analyzed and addressed in this paper is not random. Taking consideration of the 
capabilities of computer vision technology and extended understanding of Therbligs connotation in the context of 
the construction tasks, this research mainly focuses on elemental motions of “Search”, “Select” and “Find”, which 
is also complied with previous research of Oyekan et al. The description of chosen Therbligs and their connotation 
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in the context of the construction industry could be found in Table 2, gathering previous research and examples 
mentioned in the conversation with experts in both the ergonomics field and construction industry (David, 2000). 
For the application scenarios in this research, object detection function will be deployed to reduce the efforts 


needed by construction workers to search and select the tools and components they need. 


Table 1: 18 Therbligs with symbols (Ninjatacoshell, 2012) 


=> Search U Use 
<& Find Disassemble 
Select () inspect 
n Grasp h Preposition 
N Hold 7ON Release Load 
S Transport Loaded f^ Unavoidable Delay 
\/ Transport Empty -© Avoidable Delay 
9 Position $ Plan 
JE Assemble v Rest 


Table 2: Description of chosen Therbligs and their connotation in the context of the construction industry 


Therbligs Symbol Description (Niebel & Freivalds, 2013) Examples in the construction activities 
Search S Eyes or hands groping for object; A construction worker looks for the location 
begins as the eyes move in to locatean of a hammer in a warehouse. 
object. 
Select SE Choosing one item from several; A construction worker selects the 
usually follows Search. appropriately sized steel beam from a range 
of options. 
Find F Defines the momentary mental reaction A construction worker realizes that he had 


at the end of the Search cycle. 


found the correct 5 mm drill. 


2.2 AR for Worker Onboarding and Skill Development 


Industrial AR related to worker onboarding and skill development typically falls into two categories based on the 
research purposes and system functions: Step-by-Step Assistance AR and Hands-on Training AR (Butaslac, 
Fujimoto, Sawabe, Kanbara, & Kato, 2022). Both systems essentially start from the premise of breaking down 
knowledge barriers for people who do not have the ability or experience to perform the task contents. The 
difference between them is Hands-on Training AR will emphasize more on the knowledge stock after using the 
system and the ability to work independently when users are not equipped with system (Büttner et al., 2020), while 
Step-by-Step Assistance AR will emphasize the prompts, flexibility, and adaptive to users’ needs and facilitate 
quicker familiarization and more regulated execution of predetermined task procedures (Zhang, Xuan, Yadav, 
Omrani, & Fjeld, 2023). For the user groups and specific scenarios targeted in this paper, the system built can be 
categorized as a Step-by-Step Assistance AR system. 


2.3 Object Detection for AR 


Numerous papers have extensively explored the applications of AR and OD within the construction industry, 
individually. From data preparation for construction objects, the traditional object detection dataset in the 
construction context is a collection of various categories (e.g., materials, workers, and their behavior of wearing 
PPE or falling from height), messy site layout, and large objects (e.g., heavy equipment of crane, excavators, 
bulldozers, and backhoe diggers). Thus, this research establishes a near real-time object detection dataset for small 
tools and components involved in task processes for workers’ egocentric perception in construction industry. 
Besides, there exists relatively limited progress in cross-studies of AR and OD in this industry, despite its potential 
for significant advancement and promising opportunities. Wu et al. measured the utility and effectiveness of AR 
warning system on onsite construction workers with object detection for tracking onsite workers’ locations and 
dynamic hazard areas (S. Wu, Hou, & Chen, n.d.). 
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Meanwhile, several other industries, such as smart manufacturing, have conducted successful research on the 
integration of AR and OD, which can serve as valuable points of reference for further exploration in the context 
of construction industry. They highlighted possible computing pathways, which could be broadly categorized 
based on where the data are handled into server-side processing, running locally on the device, or both (Ghasemi, 
Jeong, Choi, Park, & Lee, 2022). Considering the trade-offs between computational requirement and device 
limitation, cost and latency tolerance, and network connectivity, this research will adopt Microsoft Azure Custom 
Vision library as it offers a complete high-level solution suiting for HoloLens computing capabilities and is more 
common for implementation (Lysakowski et al., 2023). 


Several publications investigated the utilization of Microsoft HoloLens 2 along with Azure Custom Vision services 
for object detection for different purposes (PatrickFarley, 2023). George created a training dataset of 215 images 
for motherboard and RAM in computer assembly task and reached an 80% match score even in varying 
environments, but also reveals challenges in limited experimental sample of three participants and false positives 
in similar components (George, 2021). Fuglseth created a proof of concept program for specific objects recognition 
and text information visualization in users’ view with an open-source Microsoft COCO dataset (Fuglseth, 2022). 
Although this research demonstrated the technical feasibility of general objects detection in daily life, it lacks a 
specific use-case context and highlights the limitations of single object detection at a time. Casano used 9 specific 
classes of the COCO dataset and successfully implemented Azure Custom Vision object detector in the HoloLens 
for assisting and supporting users for better life or easier work style (Casano, 2021). This paper introduced a more 
mature and customized system by integrating eye motions, gestures, and voice commands, but faced limitations 
of predictions efficiency and more rigorous evaluation. Their pipelines to realize the object detection function are 
similar with each other. HoloLens will take a picture based on users’ commands and uploads the picture to the 
Azure Custom Vision API. After successfully identifying, a label or images will be placed in users’ AR view for 
easier awareness. These studies underscore the growing significance and feasibility of object detection in AR 
applications, point out challenges associated with their specific applications, and also highlight the potential for 
further improvements. It includes realizing a real-time object detection, testing its usability in practical application 
scenarios of industrial tasks, and developing a more powerful AR system to support more reliable multi-objects 
detection results. 


3. PROTOTYPE DESIGN AND DEVELOPMENT 
3.1 Prototype Overview 


The fundamental idea behind the envisioned prototype involves the utilization of HoloLens 2 as an aiding 
instrument for promptly identifying objects in near-real time. Its primary function is to aid workers in identifying 
the precise tools and components required for the ongoing task phase. The target users for this prototype comprises 
generic individuals who lack prior work experience but seek rapid acquaintanceship with task-related details to 
ensure adherence to standards. For example, a novice construction worker aims to efficiently select tools and 
components aligned with the day’s designated task and make high-quality commitments to their work activities. 
The object detection function will be triggered by users’ voice commands, touch buttons, or gestures. The objects 
will be identified according to the current task step and its mentioned tools or parts. Once related objects are 
successfully recognized, the list of expected results will be three main parts (Farasin, Peciarolo, Grangetto, 
Gianaria, & Garza, 2020). A bounding box is a rectangle area that represents the object and its region. The class is 
a tag of the most probable category that the object belongs to. The probability score is the confidence level of 
algorithms in the detection accuracy and serves as a critical criterion for accepting or rejecting results. All 
information included will be displayed in the AR view as a visualized cue to workers. 


3.2 Hardware and Software 


Trimble XR10 with HoloLens 2 - Full Brim Hardhat is an integrated device in which a construction hard hat 
ensures easier wireless use in safety-constrained environments and the HoloLens 2 is the most commonly used 
XR headset. Microsoft HoloLens 2 is an ideal platform with high-tech hardware features for computer vision 
research, and also provides scalability of cloud services and connection to Microsoft Azure AI platform 
(Ungureanu et al., 2020). It sets a suitable equipment base that could serve multiple roles in proposed research and 
subsequent research, such as the source for capturing data in the form of video and frames, a computer of executing 
detection functions, and the tool for visualizing processed data and related task information (Qin et al., 2023). The 
proposed prototype is developed in the Unity cross-platform graphics engine (version 2022.3.3f1 LTS) using C# 
as programming language and MRTK (Mixed Reality ToolKit) packages for assets and interactive UI creation. 
Currently, this research adopted Microsoft Azure Cognitive Services to deploy object detection function by using 
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REST APIs and client library SDKs. 
3.3 Object Detection using Azure Custom Vision in HoloLens 


Microsoft Azure Custom Vision services enable users to rapidly customize cloud-based computer vision models 
and simply manage it using REST API calls. The overall design of architecture is shown in Figure 1. This research 
created the computer vision project in Azure Custom Vision portal and labelled a total of 12 classes including 
1,224 images and 2,008 objects. The training model and its performance is further illustrated in both Section 4.4 
and Section 5. The AR application, developed in Unity, adopted MRTK for user interaction and established a 
connection to the Azure Custom Vision API through endpoints. The Azure platform authenticates the AR system 
using Azure credentials and provides external GPU computational capabilities to preprocess the images and send 
the response through Wi-Fi. Once HoloLens receives object detection results from Azure Custom Vision, display 
holograms or annotations in the user world to indicate the detected objects. 


P cure AR Training and Implement 
HoloLens User Input s 
D 
id image èr 
> 


too 


Figure 1 Pipeline design of using Azure Custom Vision and HoloLens 2 


Information Visualization 


4. EXPERIMENT DESIGN FOR EVALUATION 


The purpose of designing experiments is to provide a comprehensive insight into whether different forms of AR 
and varying technologies involved might impact user performance on the motion level. This study is designed with 
two independent variables: the complexity of construction tasks and the assisted tools used by participants to 
accomplish these tasks. Each variable is further subdivided, as task complexity has two levels (referred to as Task 
1 and 2 hereafter), and using task-assisting tools comes in two types. Task | of the Miter Saw Stand assembly is 
the most complicated to build due to all the extra pieces and steps participants need to follow to make it work 
properly. Task 2 of the Scaffolding assembly is straightforward to understand what the task entails, but it’s also 
easy to assemble it incorrectly and skip steps on some safety details. Detailed descriptions of these task 
specifications can be found in the subsequent section (section 4.3). The first type of assisted tools is a conventional 
AR prototype, which presents participants with guided text, images, and videos (referred to as Prototype 1 
hereafter). On the other hand, the second type employs a more advanced AR prototype that includes object 
detection functionality to highlight crucial components for users (referred to as Prototype 2 hereafter). 


4.1 Hypotheses 


This research is formulated following hypotheses: H1: When using Prototype 2, participants were able to make 
fewer mistakes and complete the task with higher quality. H2: When using Prototype 2, participants were able to 
complete the task more efficiently, spending less time on “Search”, “Select”, and “Find” Therbligs. H3: When 
using Prototype 2, participants’ cognitive demands were lower, and they can obtain better understandability for 
task contents and unfamiliar tools. H4: When using Prototype 2, participants think it is more intuitive, efficient, 
and enjoyable to use. 


4.2 Bias Control 


Potential bias and the effects of irrelevant factors, such as participants’ familiarity with AR concepts and interaction, 
existing skills or learning curve, are more or less to interfere with the experiment results. The counterbalancing 
design principle and within-subject principle are throughout the entire experimental design and adopted controlling 
measures are stated as follows. Given that participants might have varying levels of proficiency with AR, some 
individuals are experienced users and developers, while others have a more superficial understanding [21]. Though 
researchers prepare an illustrative ppt for introducing experiments, explaining the meanings of prototype UI and 
panel, and briefly showing how to perform the sample tasks using different prototypes, researchers concern 2D- 
based explanation is less intuitive than real experience with device. Therefore, the prototype provides a 
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comprehensive quick start guide to build perceptual awareness of AR capabilities and narrow knowledge gap 
among all participants. To mitigate the potential order effects and learning curve, there is also a clear definition of 
how to assign the participants into groups and decide their starting sequence. All participants will be randomly 
divided into two Group A and B. Both Group A and Group B will be exposed to two kinds of prototype and operate 
the same task, assembly Miter Saw Stand in the test phase 1 and assembly Scaffolding in the test phase 2, as shown 
in Figure 2. The difference between the two groups is Group A will start using Prototype | first, then they will shift 
to Prototype 2 in the test phase 2, while Group B will use two prototypes in reverse order. 


4.3 Task Specification and Therbligs-based Information Presentation 


As the basis of experiment design, this research selected the Metaltech Multipurpose 4-in-1 6 ft. Baker Scaffold 
as the specific user case for experiments, as the scaffolding is a typical and normal task in the construction site 
with higher hazards (Khan, Saleem, Lee, Park, & Park, 2021). Rigorously building up scaffolds is a vital safety 
management measure and prevents potential serious individual accidents. In terms of task content design, it can 
also be converted into another form of miter saw stand, which is a routine task for carpentry but different from 
those steps and used components in scaffolding assembly. Meanwhile, the difference in difficulty levels between 
the two tasks makes this choice more suitable for designing an experiment. After field assembly by three 
researchers, they agreed that the miter saw stand was the most complex and difficult to build, while the remaining 
three types were not as difficult to distinguish. As mentioned in previous section, the effect of decreased physical 
strength on experimental effectiveness, as well as increasing the difficulty differentiation between two tasks, two 
researchers worked together to simplify the task of scaffold assembly. Subjects would neither assemble the upper 
ladder to the lower part to prevent a total shelf height of about two meters, nor would they rotate the assembled 
shelf up and down. In the step of transitioning between the scaffold and the miter saw stand, they won’t flip the 
entire platform due to its potentially harmful weight and surface area. 


MITER SAW STAND 


Figure 2 Selected Multipurpose 4-in-1 6 ft. Baker Scaffold for Experiment (The Home Depot, 2022) 


Figure 3 shows the partial sequence of the assembly scaffold in the form of “Search”, “Select”, and “Find” 
Therbligs. The experiment workplace setup is shown in Figure 4, where all components are lying on the ground 
and a nearby shelf is for PPE equipment. “Search” Therbligs is reflected in locating the same type of tools from 
numerous building components in the package, such as searching ladders, braces, and locking pins. “Select” 
Therbligs happens less often than search, which is embodied in choosing the right one from a variety of similar 
things or alternatives, such as selecting a lower ladder from ladders where the upper ladder is a misleading option 
for participants (as shown in Figure 5). “Find” Therbligs is a momentary mental activity that is reflected in the 
participant starting to move on to the next activity, such as the participant grabbing the searched brace and locking 
it in place using the U-lock kit. 


“Search” f> Personal Protective Equipment (PPE) on the tools shelf 
t 
“Select” f> PPE for material-specific tools to the construction task 
t s 
“Search” f Ladders (both upper and lower part) locatior 
C 
“Select” me Specific lower ladder for the task 
Fask 2: Scaffolding 
“Search” > Location of rolling wheels and their locking pins 
—— 
“Search” *2 H Location of Brace piece 
Y 
"Search" f> Location of Platforr 
“Search” f> Location of locking pins. 


Figure 3 Partial sequence of assembly scaffold in the form of “Search”, “Select”, and “Find” Therbligs. 
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Figure 4 Experiment workplace setup. Figure 5 Experiment setting example for “select”. 


4.4 Dataset Preparation 


High-quality image data for target objects matters a lot to create a robust object-detection model (Lee, Jeon, & 
Shin, 2023). As shown in Table 3 and Table 4, this present study constructed a dataset to train and test the detection 
model, which took a total of 1,224 images, including 2,008 objects with 12 categories of classes. The dataset is 
compiled by two researchers using four devices of Hololens, iPhone, iPad, and Android Phone. This collection 
covers a diverse range of angles, lighting conditions, and backgrounds, drawing from the environments of two 
distinct research laboratories. In addition to sufficiently high-quality images of the objects in question, another 
important thing is the quality and quantity of annotation. The two researchers agreed on the labeling to ensure that 
the bounding box was strictly around each object. When one researcher’s annotation is complete, another 
researcher will cross-review each annotation result to ensure consistency. The quantity of each tag is roughly above 
120 images, which makes the distribution even and not biased. 


Table 3 List of scaffolding components for object detector training 


No. QTY. Used in EXP. Class No. Annotated Images 
1 2 Lower ladder 137 
2 1 Platform 144 
3 4 Mounting bracket 194 
4 2 Piece support 132 
5 2 Brace 147 
6 2 Shelf brace 169 
7 1 Wire grid shelf (S) 216 
8 5 Wire grid shelf (L) 217 
9 4 5 in. caster 168 
10 10 Locking pin 168 
11 2 Anti-tip assembly 183 
12 2 Tightening knob 128 
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Table 4 Examples of Captured Images 


1. Lower ladder 2. Platform 3. Mounting bracket 


di 
g 
r 


vu 


4. Piece support l : 6. Shelf brace 


Ai 


9. 5 in. caster 
10. Locking pin 11. Anti-tip assembly 12. Tightening knob 


4.5 Procedure and Evaluation Metrics 


The overall experiment procedure is shown in Figure 6 an estimated duration of 20-25 minutes. Three GoPro 
cameras, each at an angle of 120 degrees to each other, were used to record the entire experiment to facilitate the 
subsequent analysis of the participants’ Therbligs-based movements. Each participant will be through four steps: 
one preparation step, two test steps, and one after-testing step. After both Test Phase 1 and Test Phase 2, participants 
will be required to fill in a designed after-testing questionnaire immediately to express their direct subjective 
perception. At the end of the experiment, participants were allowed to volunteer for a short interview to express 
their opinions on improvements to the system prototype, their willingness to accept the technology, and other 
feelings not covered in the questionnaire. The data acquisition is based on a four dimensions evaluation: Quality, 
Efficiency, Mental Demand, and User Experience. The purpose of quality assessment is to detect the degree of 
precision in the work of the participants. This is done manually by experimental observers and is scored based on 
a complete error protocol. The error protocol describes errors in detail for two tasks, scores each of the two 
archetypes, and assigns three levels of scores according to the severity of the error (Wolf et al., 2021). The number 
of errors and the weighted total error score were finally statistically analyzed. Efficiency can be assessed in two 
ways. On a macro level, the overall time spent by each participant in completing the tasks using the two prototypes 
will be compared. On a micro level, experimental observers will use a timer to calculate the duration spent on the 
motion level of each Therbligs. This paper will explore to what degree user-centered process-oriented object 
detection has a significant effect on “Search”, “Select”, and “Find” Therbligs. Mental demand and user experience 
are mainly obtained through validated questionnaires and supplemented with optional semi-interviews. 
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Participants Action | Procedure Experiment Details 


All participants 
J 


Random Grouping 


Preparation: 

Experiment introduction, - Sign consent forms 
and required documents; 
~ Quick start gusde: 


f y 


> . Prototype 1: A conventional AR prototype: 


. : | 
onpa Group B Test Phase 1: 
' Task 1: Miter Saw Stand; 
Complete Task 1 Complete Task 1 


jer Prototype 1 


~ Prototype 2: An advanced AR prototype with 


Fill in Questionnaire for Fill in Questionnaire for Object detection function, 


Phase 1 Phase 1 


Crossover L 


Y . y 
G B G A >l Test Phase 2: 
srow Frou á 
sil fad - Task 2: Scaffolding 
Complete Task 2 Complete Task 2 
under Prototype 1 under Prototype 2 
Fill in Questionnaire for Fill in Questionnaire for . 
Phase 2 Phase 2 Evaluate Metrics and Data Acquisition: 
\ J Quality; 
~‘ i >} - Efficiency 


Mental Demand. 
Interview and - User Experience; 


Free Exploration 


Figure 6 Experimental comparison between conventional AR and advanced AR with object detection 


5. OBJECT DETECTION MODEL TRAINING RESULT 


After training the model using 3 hours budget with General (compact) domain, the second-iteration training ended 

with 85.6% precision, 86.4 % recall, and 92% mAP. It is noted that when trying to train for more budget hours, the 

results remained the same which indicates that object detection model reaches its limitation by using current dataset. 
These metrics provide critical insights to evaluate the accuracy and effectiveness of object detection models. 

Precision is how many of the predicted instances are to be actually correct, recall gauges how well the model is 

capturing all the relevant correct instances, and mAP (mean Average Precision) represents the overall performance. 

This proposed model has a relatively higher performance in the mAP, which means that the model achieves a good 
balance between the precision and recall across different thresholds. 


The excellent performance of this model is not only reflected in the data metrics, but also in the test images, as 
shown in . All objects, including very small Tightening Knob and Anti-tip Assembly objects, were successfully 
recognized one by one with a high success rate of more than 50%. However, there is still room for improvement 
in this model, as shown in Figure 8. As we mentioned above, we used the upper ladder as a misleading option, 
allowing participants to select the correct one from the two ladders for subsequent experiments. The trained model 
was unable to effectively distinguish between the two ladders when recognizing similar ones. 


Tag Probability 


Figure 7 Examples of Object Detection Results from developed model 
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Figure 8 Improvements needed in the model. 


6. CONCLUSION AND FUTURE WORK 


This research further developed the AR-Driven Task Assistance prototype by integrating object detection function 
to reduce elemental motions of “Search”, “Select” and “Find” in the Therbligs theory, and designed experiments 
to verify its direct benefits for construction workers and ease cognitive demanding during performing tasks. By 
integrating real time object detection into an AR-driven task assistance prototype, it is expected to enhance 
construction workers’ perception and situational awareness with a wearable, hands-free AR headset, which won’t 
interfere with workers’ current activities and enable a relatively larger and flexible Field of Vision (FoV) than 
mobile phone or tablets (Lysakowski et al., 2023). 


This research is limited to a single pipeline to realize proposed application scenario, which leaves a researchable 
question on “Is there an optimal solution for the same function”. Since existing methods do not discriminate well 
between similar things, it is worth further improving the algorithm or exploring other publicly known algorithms 
in this domain. Besides, though there are some publications realizing a similar function on HoloLens, it is still 
worth comparing the performance of different algorithms by using a dataset of the same quality, diversity, and 
complexity. What’s more, this research is proposed to deploy real-time object detection into an AR-driven task 
assistance prototype and also verified by scaffolding and miter saw assembly activity, which is aimed at solving 
practical problems faced by construction industry. However, despite construction industry, other industries might 
encounter similar issues and challenges awaiting to be further improved. This leaves future efforts to generalize 
this proposed pathway to other industries and slightly adjust to their specific challenges. 
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VISUALIZATION OF WEATHER-AWARE AMBIENT HEAT RISKS 
WITH GLOBAL ILLUMINATION IN GAME ENGINE 


Naotaka Sumida, Taira Ozaki, Satoshi Kubota, Dan Hiroshige & Yoshihiro Yasumuro 
Kansai University, Japan 


ABSTRACT: In recent years, the risk of heat stroke has been increasing due to global warming and other factors, 
and the Ministry of the Environment has been using the heat index to alert people in urban areas. Still, citizens 
need help knowing the detailed risk information for their neighborhoods. The heat index considers the human 
body's heat balance and is measured with specialized instruments, so comprehensive and high-density 
measurement is difficult. In this research, by using global illumination (GI) and primary weather data obtained 
from the Open Weather Map API for each area, we can realistically render the sunlight condition considering the 
weather and map and visualize the heat index per pixel based on shaded CG. Furthermore, by reconstructing 
urban 3D geometry from Google Maps, we have developed a system that visualizes the ever-changing heat index 
distribution for an arbitrary location in real time. The system has shown the possibility of reducing the number of 
heat stroke patients by using this system. 


KEYWORDS: heat index, WBGT (wet bulb globe temperature), real-time global illumination, game engine 


1. INTRODUCTION 


The number of heat stroke cases and deaths continues to increase in modern society due to the effects of severe 
climate change caused by global warming and the heat island effect (Ministry of Health, Labour and Welfare, 
2013~2020).In particular, the proportion of heat stroke patients who suffer heat stroke when outside has reached 
approximately 60% of all cases (Ministry of Internal Affairs and Communications, 2022), and the possibility that 
heat stroke is not limited to certain times of the day is also increasing (Ministry of Health, Labour and Welfare, 
2018-2022).In addition, the WBGT (Wet Bulb Globe Temperature), an indicator of heat stroke risk provided by 
the Ministry of the Environment, showed that the value in 2022 exceeded the average value of the previous ten 
years and that the upward trend is continuing (Ministry of the Environment, 2022). Daily life activities require 
outdoor maintenance and cleaning work, visits to neighborhood stores, etc., and from the viewpoint of health 
promotion, visits to parks have become habitual activities for a wide range of age groups (T. Ozaki et al., 2019). 
The Ministry of the Environment publishes WBGT values for each city on the Web to alert the public (Ministry of 
the Environment, 2023). However, as one WBGT value is assigned to a metropolitan area, the outdoor environment 
of each location, comprising different geographical features such as city blocks, parks, and construction sites, is 
not considered. Facility and site managers must accurately understand the environmental risks visitors and workers 
face. In addition, schools and other educational institutions are required to regularly measure WBGT before and 
during outdoor activities such as sports festivals and excursions to ascertain the level of risk so that classes and 
activities can be conducted more safely (Ministry of the Environment, Ministry of Education, Culture, Sports, 
Science and Technology, 2021). However, there are limitations to deploying a large number of WBGT measuring 
devices to collect information. Each individual needs to make decisions and respond based on their own experience. 
In this study, we propose a new method to visualize the distribution of heat index under sequential changes in the 
sunshine environment and provide it to general users by using regional meteorological data and 3DCG-based 
surface solar radiation estimation using global illumination (GD. 


2. PREVIOUS WORK 
2.1 Okada-Kusaka black-bulb temperature estimation formula 


WBGT is an index focusing on the heat balance between the human body and the outside air and is calculated 
using equation (1), considering the surrounding thermal environment such as humidity, solar radiation, and air 
temperature. A black-bulb thermometer is required to measure radiant heat, but Okada et al. point out that it is 
difficult to measure the temperature stably and continuously. Therefore, Okada et al. estimated black-ball 
temperatures using total solar radiation, wind speed, and dry-ball temperature as explanatory variables (M. Okada 
et al., 2013). The estimation equation (2) enables the estimation of black-ball temperature from meteorological 
data on total solar radiation, wind speed, and dry-ball temperature, making it possible to calculate WBGT estimates 
not only for sunny days but also for a wide variety of weather conditions. 
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WBGT = 0.7 x Wet Bulb Temp. +0.2 x Black Bulb Temp. +0.1 x Dry Bulb Temp. (1) 


Global Solar Radiation — 38.5 


Black Bulb Temp. = | 9577 x Global Solar Radiation + 4.35 x Wind Speed + 23.5 


+Dry Bulb Temp. (2) 


2.2 3DCG-Based Estimation Method of WBGT and Its Application 


Yasumuro et al. proposed a method to estimate WBGT by optical calculation using GI for 3DCG rendering to 
visualize the effect of green shade against heat, as shown in Fig. 1(Y. Yasumuro et al., 2018). First, standard 
reflectors placed under various shades of green were photographed with a single-lens reflex camera at fixed 
exposures, and the pixel values were converted to absolute luminance values. On the other hand, the total solar 
irradiance at the position in the reflector corresponding to each pixel is also measured, and a linear regression 
analysis is used to determine the correlation between the absolute luminance and the total solar irradiance. WBGT 
can be estimated from Equation (1) by obtaining the black-bulb temperature using the estimation equation (2) of 
Okada et al. based on the total solar irradiance calculated using this correlation equation and primary 
meteorological data such as dry-bulb temperature, wet-bulb temperature, and wind speed, which indicate the 
conditions at the target location. Furthermore, Yasumuro et al. have made it possible to estimate WBGT without 
the need for on-site photography by realistically rendering shades using GI with CG that virtually sets the same 
reflectance characteristics as the standard reflector shown in Fig. 2. By preparing a 3D model of a landmark, it is 
possible to reproduce solar radiation conditions in 3DCG based on the latitude and longitude of the landmark and 
its position concerning the sun, making it possible to determine the heat-protection effect of green shade at any 
given time. In this research, the GI that reproduces photorealistic solar radiation conditions in CG requires a large 
amount of light path search, and the generation of CG takes time each time the conditions are changed. 
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Fig. 1: Process chain for estimation of WBGT and visualization as heatmap image 


3D model of greenway 


Fig. 2: Example of heat simulation of a greenway with varying plantings 
(left: Current planting condition, right: Doubled planting condition) 
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2.3 Visualization of Environmental Heat Risks with Global Illumination in Game 
Engine 


The authors proposed a method to visualize heat index distribution for ever-changing sunlight conditions 
regardless of location and stream it to general users on the Web by using a game engine utilizing real-time GI to 
speed up the generation of solar radiation CG by GI and developing a system to obtain basic weather information 
in real-time as shown in Fig. 3 and demonstrated the effectiveness of this function (N. Sumida et al., 2022). 
According to the estimated WBGT value, a function is implemented to visualize the risk of heat stroke as a multi- 
step heat map using CG by setting colors and interpolation processing using the UV coordinates of the texture 
based on the colors of the danger levels shown by the Ministry of the Environment in Fig. 4. Although the 
visualization results are provided universally to the user's terminal via a web browser, this method requires 3D 
data on the terrain covering the target area for generating CG, and there are many areas where 3D data are not 
publicly available, limiting the applicable target areas. In addition, the calculation of WBGT requires primary 
meteorological data for the area. Still, a system that can automatically collect and reference these by region and 
time needs to be solved. Systematization to solve these problems remains challenging to realize an information 
service that presents heat risks for any given location and time in response to user requests. 
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Fig. 3: CG of ground shading with standard diffuse reflection (left) and resultant WBGT heat map (right) 
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Fig. 4: U-V coordinate of the color texture based on Ministry of the Environment guidelines 
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3. PROPOSED METHOD 


In this study, we propose a system that collects and utilizes necessary data by employing weather data services and 
3D information services available on the Internet for an application server with a WBGT estimation function, 
which the authors have already constructed. Fig. 5 shows the processing procedure of the system proposed in this 
study. In the real-time GI used by the application server, many image data (called light probes) mapping light 
source information in all directions are placed in the target space, and information on many light ray paths, 
including inter-reflection by static objects such as buildings, is pre-calculated as textures. By tracing light probe 
information, the cost of calculating the synergistic effects of light rays can be significantly reduced, and high- 
quality computer graphics that include effects such as indirect light can be rendered in real-time. (SILVENNOINEN, 
A et al., 2017) (K. Kurachi. 2007). Although the target scenes of this study include complex shapes such as trees 
and buildings, dynamically changing geographic objects are not required, making the system suitable for 
introducing real-time GI. The user specifies the date, time, and location from a Web browser on the terminal at 
hand and accesses the application server via the network to use this application. The application server uses the 
date, time, and location information specified by the user as a query, extracts the corresponding weather data and 
3D data from the database, generates CG of the sunshine conditions, and estimates WBGT. The weather data 
collection server automatically obtains weather information using weather data services provided through public 
APIs and sensor networks of instruments installed in the region. It stores the relevant information in the weather 
information database. The 3D data collection server automatically collects 3D data and material information of 
publicly available geographic objects and stores this information in a 3D model database. For areas where 3D data 
is not publicly available, a database is created by reconstructing 3D models from map services that can be viewed 
in 3D on the Internet. By designing the database with the data items required for WBGT calculation in the 
application server as attributes, data collected from different information sources can be effectively utilized. The 
3D data collection server automatically collects 3D data and material information of publicly available geographic 
features and stores this information in the 3D model database. For areas where 3D data is not publicly available, a 
database is created by reconstructing 3D models from map services that can be viewed in 3D on the Internet. This 
system configuration is expected to increase the affinity with the information provided by other services already 
widely used as information infrastructure and extend the range of applications of this method. 
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Fig. 5: Proposed System Process Chain 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


4. IMPLEMENTATION AND VERIFICATION 


This study utilized Unreal Engine 4 (UE), a game development platform with real-time GI and sun placement 
functions based on geographic coordinates and functions equivalent to the exposure settings of a single-lens reflex 
camera, which can be used to generate realistic computer graphics. An input interface shown in Fig. 7 is 
implemented in the UE to allow users to select specific locations. After entering a place name, a button for selecting 
the corresponding place name is displayed. When the user clicks the button, the user is automatically redirected to 
a map showing 3D data of the corresponding area. When the user clicks on a button, the user is automatically 
redirected to a map displaying 3D data of the corresponding area. When the place name is searched by prefecture, 
all buttons for the corresponding city, town, or ward are listed and can be selected in a scrolling format. The method 
of setting the date and time and the output process of the heat map of WBGT distribution after setting the date and 
time are based on the authors! existing method. 


The weather data collection server in the proposed system uses Open Weather Map API provided by Open Weather 
to obtain publicly available weather forecast data. Open Weather Map API is suitable for this study because it can 
provide basic weather information required for WBGT estimation, as well as information on specific locations and 
times required for CG generation using real-time GI. In addition, a DAVIS VantagePro2 sensor is used to collect 
real-time weather information at individual sites. The Open Weather Map API and the VantagePro2 are suitable 
for this study because they are commercially available and can be installed at individual sites. Considering that 
VantagePro2 sends data in Json format to the weather data collection server, as shown in Fig. 7, we implemented 
a parsing function in the application server UE to analyze the received Json data and sort them into attributes 
suitable for this system. This implementation makes it possible to comprehensively process the acquired 
information and convert it into a format suitable for the proposed system, even when the data sources differ. 


Tokyo 


Chiyoda City 


Fig. 6: Pull-down menu interface for selecting location incorporated in the game engine 


Fig. 7: Json data from Open Weather Map API (left) 
and Json data from VantagePro2 (right) 
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The authors implemented a system that enables data acquired by a weather data collection server to be stored in a 
weather data database created in the relational database service (RDS) of AWS, a cloud service provided by 
Amazon, Inc. In addition to the basic meteorological information necessary for WBGT estimation, the database is 
designed to prevent data inconsistencies by setting latitude/longitude and place name information necessary for 
location identification as attributes. However, although wet-bulb temperature is required to calculate WBGT from 
Equation (1), some measurement equipment measures humidity instead of wet-bulb temperature. Open Weather 
Map API and VantagePro2 used in this paper do not measure wet-bulb temperature, so they calculate an 
approximation of wet-bulb temperature from humidity and dry-bulb temperature. If the equipment measures wet- 
bulb temperature, it is directly stored in the database, and if it cannot measure wet-bulb temperature, the system 
calculates an approximate value. 


The 3D data collection server in the proposed system acquires extensive and detailed 3D city data publicly 
available, such as PLATEAU provided by the Ministry of Land, Infrastructure, Transport and Tourism in Japan. 
Asa method of storing the acquired 3D data, we are considering a method for directly storing the data in a database 
or storing the 3D data in a cloud service such as Dropbox or Google Cloud Platform (GCP) and storing only a link 
to the destination in the database. In cases where open-source data is unavailable, we adopt a method of 
reconstructing 3D data from information sources such as 3D data from online map services such as Google Maps. 
The verification procedure is to set multiple waypoints in Google Maps, as shown in Fig. 8 (left), and create a 
KML file that contains geographic coordinate information to specify the route of the viewing viewpoint. By 
importing and executing the created KML file into Google Earth, it is possible to capture virtual aerial images that 
simulate UAV flight as shown in Fig. 8 (right), and through 3D reconstruction by SfM using photogrammetry, 3D 
data acquisition, as shown in Fig. 9, is realized. With the above implementation, obtaining the data necessary to 
generate a heat map is now possible using only the specific date, time, and location information entered by the 


user. 


Fig. 8: Waypoints specified by our KML file depicted on Google Maps (Osaka: Suita, Japan) (left) and 
the captured virtual aerial photos through the waypoints in Google Earth (right) 


Fig. 9: 3D reconstruction based on SfM using the photo images in Fig. 8 (Osaka: Suita City, excerpts, Japan) 
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5. EXPERIMENTAL APPLICATIONS 
5.1 Case Study in Shibuya and Shinjuku City, Tokyo 


The following is an example of applying the proposed system to 3D data of Shibuya and Shinjuku city obtained 
from PLATEAU, a 3D data utilization service provided by the Ministry of Land, Infrastructure, Transport and 
Tourism (MLIT). Furthermore, we show that the system can visualize the detailed distribution of heat stroke risk 
during the day and night by combining the system with time-dependent weather data stored in the weather 
information database, as shown in Fig. 10 and Fig. 11. At the time of writing this paper, the weather information 
for August 2023 was not available, so the data for August 11, 2022, was used for CG generation. 


5.2 Examples of heat stroke risk prediction validation 


During the Golden Week (early-May holiday season in Japan) of 2023, temperatures exceeding the average of 
18.8°C announced by the Japan Meteorological Agency were observed. The maximum temperature in Tokyo on 
May 6 reached 27.9°C, which potentially increased the risk of heat stroke for people outside the city. This result 
shows that the risk of heat stroke in front of Shinjuku Station was at a level where it is recommended to actively 
hydrate oneself (Fig. 12). This result suggests that even during periods when the risk of heat stroke is generally 
recognized as low, the risk may still exist, and this system is an effective means of verifying this. 


Fig. 10: Heat map of Shibuya City during the daytime 
August 11, 2022, at 10:00 (left) August 11, 2022, at 14:00 (right) 


—— cay 
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Fig. 11 Heat map of Shinjuku City at night 
August 11, 2022, at 18:00 (left) August 11, 2022, at 22:00 (right) 


Fig. 12: Heat map of Shinjuku City during the day May 6, 2023, 12:30 p.m. 
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5.3 Case Study in Suita City, Osaka 


For areas where open-source 3D data such as PLATEAU (see (1)) is not available, this method can be applied to 
3D reconstruction using SfM from existing services such as Google Map's 3D view to generate a heat map as 
shown in Fig. 13. By importing kml files containing geographic coordinate information into Google Earth, it is 
possible to take aerial images that simulate UAV flights. We have shown that large-scale collection of 3D data is 
possible at low cost by using Google Earth to take comprehensive aerial photographs of surrounding areas. 


Fig. 13: Suita City: Heat Map 
(3D reconstruction based on SfM), August 2, 2021, 12:52 PM 


6. CONCLUSION 


In our research, we developed a system that utilizes public weather information and 3D data using a game engine, 
enabling WBGT estimation and real-time heat maps for various regions and times. In the future, we aim to create 
a new solar radiation correlation equation that enables WBGT estimation on cloudy days based on weather 
information such as clear and cloudy skies obtained from meteorological data. In addition, the physical 
characteristics of 3D models of geological objects have yet to be considered. We plan to analyze the impact on 
WBGT by the reflectance and transmittance of surrounding buildings’ surface materials and vegetation, 
considering the findings of prior CFD (computational fluids dynamics) studies. Incorporating the results of these 
analyses into the model will allow for a more accurate and realistic assessment of the thermal environment. 
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IMPROVING SENSE-MAKING FOR CONSTRUCTION PLANNING 
TASKS USING VISUAL AND HAPTIC STIMULI IN VIRTUAL REALITY 
ENVIRONMENTS 


Ivan Mutis, Marina Oberemok, & Nishanth Purushotham 
Illinois Institute of Technology, Chicago, Illinois, United States of America 


ABSTRACT: Design documents, drawings, and specifications are visual representations that are fundamental and 
prevalent in today’s construction engineering practice. Construction specialties (e.g., structural, mechanical) rely 
on these visual representations to express and draw meaning during collaborations. Construction engineering and 
management (CEM) students must acquire the knowledge, skills, and abilities — a key example of which is 
perceptual competence —for interpreting visual representations to facilitate efficient task execution, such as 
planning. Empowering learners with new technology using robust real-world immersion and interactive features 
is a significant step towards this target. The presented research explores new human-machine interactions to 
determine the best way for CEM students to learn through the combined senses of sight and touch. The approach 
merges visual and haptic interactions within an immersive environment to enhance perception and reasoning skills. 
The research demonstrates how CEM learners interact with and interpret the meanings of information within a 
planning task. It explores how VR and haptic technology augment the ability to recognize meanings — a new type 
of representational competency — for improved interpretation of information related to components with respect 
to engineering disciplines and sub-systems in a CEM, and investigates learners’ problem-solving ability by using 
perception-rich enhanced virtual reality (VR) and haptic affordances. 


KEYWORDS: haptic cues, human-computer-interaction, design interpretations 


1. INTRODUCTION 


To satisfy the educational needs of STEM learners and foster essential 21st-century skills, such as critical thinking, 
reasoning, problem-solving, collaboration, and communication, educators must integrate innovative technology 
into the learning process (NSF, 2020). To address these requirements, human-computer interaction (HCI) offers 
viable solutions to augment human senses and enrich sensory input, including vision, hearing, smell, and touch 
(Manchanda et al., 2017). 


The sense of touch or haptics is one of the most informative human senses. This sense includes both cutaneous 
and kinesthetic sensations. Embracing haptics opens up new possibilities to expand human capabilities, such as 
improving manual dexterity and enhancing sensory perception (Chryssa & Julie-Ann, 2020). This research takes 
advantage of the HCI affordances and explores the use of haptic technology in learning for Construction 
Engineering and Management (CEM) students. 


Fundamentally, to explore the use of haptics in CEM learning, the presented approach draws on an individual’s 
spatial-temporal cognitive ability (STCA) (Mutis, 2018a). Spatial-temporal ability allows learners to effectively 
manage and comprehend significant amounts of spatial (how design components are related to one another in the 
3D space) and temporal (the logic in a process, such as the order, sequences, and hierarchies of the resources within 
a construction task) information (Mutis, 2018a). Limited or no ability to process spatial and temporal information 
(1.e., lack of spatial and temporal cognitive ability hinders the understanding of designs and management of the 
varying local conditions (e.g., unplanned conditions) (P. Antonenko & I. Mutis, 2017; P. D. Antonenko & I. Mutis, 
2017; Mutis, 2014, 2015; Mutis, 2018b). The ability helps learners to conceptualize three-dimensional 
relationships between objects in space and mentally manipulate them as sequential transformations over time. 


The STCA cognitive ability allows the CEM learners to recognize meanings and facilitates coupling observed 
representation to the given contexts — a new representational competency. The coupling abilities (spatial and 
temporal) significantly benefit the decision-making process. Individual spatial-temporal abilities are associated 
with high cognitive reasoning that defines the cognitive-processing chain — from basic visual attention to higher- 
level reasoning, such as an interaction between organizing, performing, and supervising the effectiveness of a plan 
(Mutis, 2018a). For instance, planning is a highly cognitively demanding task where STCA plays a pivotal role. 
Planning is critical as the learner couples observed representation in a given context to organize, perform, and 
supervise the effectiveness of a plan while interpreting information from engineering designs. Effective STCA 
training enables individuals to instantly identify concepts, events, and patterns for comprehension and projection, 
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streamlining actions, solutions, and implementations in planning. 


The presented approach explores the uses of haptic technology to augment cognitive capabilities, in particular the 
STCA. The STCA augmentation effect is from the cognitive load reduction by using a new sensing channel 
(haptics) in the cognitive process by liberating mental resources for other cognitive tasks (Sweller, 1988; Sweller 
et al., 2019), potentially enhancing spatial and temporal processes that are fundamental in problem-solving tasks. 
The assumption is that learners can rely on their haptic sense to reduce efforts of converting cognitive processes 
into physical actions— alleviating the burden of effort for processing spatial (e.g., spatial configurations of design 
components in the 3D space) and temporal (e.g., the logical sequence of design components for their assembly) 
information. The use of new senses (haptics) is a form of increasing the impact of embodied intervention in the 
cognitive process by, for instance, facilitating tracking information and gaining object rotations to feel and 
comprehend spatial relations more accurately (Tran et al., 2017). 


By using perception-rich enhanced virtual reality (VR) with haptic affordances, this study addresses the following 
questions: 

1. What aspects of haptic stimulus impact the learners’ development of representational competence for 
better interpretation of information related to designs in a CEM? The research outlines the importance of 
improving spatial-temporal skills to facilitate high-level reasoning in complex situations. 

2. What new HCI factors, combining visual and haptic (VH) interactions with engineering designs, enrich 
the perception and reasoning skills of CEM learners, leading to more accurate and efficient task 
execution? The solution presents a haptic language that implies tactile cues enhancing spatial awareness 
for the given context. 


2. BACKGROUND 


Researchers in STEM education are exploring the ways in which haptic technology can enhance the learning 
process, including improving student engagement, conceptual understanding, and skill acquisition. Early studies 
focused on developing haptic devices for enhancing spatial awareness and visualization skills (Liu et al., 2003; 
Williams Ii et al., 2001). Later research underlined the benefits of haptic feedback in improving interactions and 
spatial guidance (Jong, 2014; Takahashi et al., 2009). As demonstrated in further publications, augmenting VR 
with haptics increases overall task performance and the users’ perceived sense of presence (Cooper et al., 2018; 
Kreimeier et al., 2019). 


Over the years, haptic interventions in architecture, engineering, and construction (AEC) have been applied to 
simulate assembly tasks (Medellin-Castillo et al., 2015) and develop vocational training for construction personnel 
such as carpenters, plumbers, and masons (Jose et al., 2016; Ranjith et al., 2014). Current research aims to cultivate 
more sophisticated haptic devices and techniques for human-machine interaction in AEC, including haptic 
feedback for mixed reality and teleoperation (Adami et al., 2022). 


In general, haptics is extensively used in engineering learning, including training, physics and chemistry 
simulations, robotics, and automation (Prabhakaran et al., 2022; Sanfilippo et al., 2022). Engineering education 
utilizes haptic interfaces to provide students with hands-on experience with virtual simulations. Likewise, 
vocational training with haptics provides realistic practice in handling heavy machinery and tools. Lastly, by using 
haptic devices on remote-controlled construction robots, operators are able to discern the properties of various 
objects and materials during the manipulation (Alakhawand et al., 2022). Thus, haptic technology shows promise 
to transform traditional learning and training methods, offering advantages such as enhanced knowledge retention, 
engagement, skill acquisition, safety, and accessibility (Mastrolembo Ventura et al., 2022). 


Several studies have been conducted on assembly techniques, but only a few have explored the incorporation of 
haptics due to their relative novelty as an assistive tool in STEM learning. However, the development of haptics 
shows potential for enabling innovative approaches to enhance cognitive and motor skills, particularly in tasks like 
modeling, assembling, and teleoperation. For virtual assemblies, Yuan et al. (2008) introduced an augmented 
reality (AR) approach, utilizing a virtual interactive tool called VirIP and a visual assembly tree structure (VATS). 
This system enables assembly operators to seamlessly follow a pre-defined assembly plan/sequence without 
requiring sensor schemes or markers on the assembly components. Hu and Zhang (2012) presented a method 
leveraging a 3D game engine and software component technique to rapidly construct a reusable component library 
to develop virtual assembly experiments. In recent work, Li et al. (2020) proposed a framework with advanced 
computations such as runtime degrees of freedom (DOF) determination, disassembly directionality computation, 
and assembly/disassembly sequence generation. These computations efficiently integrate assembly constraint 
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information into a virtual assembly application with minimal effort required. 


Haptic technology allows the transfer of touch-based information between humans and computer interfaces (OED, 
2020). Haptics can enhance the learning experience and support an environment that cultivates student engagement, 
motivation, and interest in the subject matter (Tytler, 2020). Haptic interaction is crucial for a sense of presence 
and manipulating objects in remote or virtual environments with manual dexterity (Kortum, 2008, p. 25). For 
example, by providing users with tactile cues, haptics makes the digital environment more interactive and 
informative. 


In the AEC discipline, there are proposed haptic interventions that aim to assist users in accomplishing an 
engineering task providing guidance for the decision-making process. Rahimian and Ibrahim (2011) proposed a 
haptic-based VR 3D sketching interface to improve novice designers’ engagement with “problem-space” and 
“solution-space”, leading to increased artifact maturity in collaborative conceptual architectural design. Following 
Christiand and Yoon (2011) work, haptic-path sequence guidance reduces the assembly time and the travel distance 
that enhances the working performance of virtual assembly tasks. Also, the availability of haptics in large 
immersive environments can contribute to future advances in virtual assembly planning and factory simulation 
(Pavlik et al., 2013). Yeh et al. (2013) suggested that multi-symbolic representations (text, digits, and colors) in 
haptics-enhanced virtual reality systems have the potential to help collaborative work effectively. James et al. 
(2019) proposed a bi-manual haptic interface for skill acquisition in surface mount device soldering. Coffey and 
Pierson (2022) demonstrated the effectiveness of the proposed haptic guidance system for co-navigation of non- 
holonomic vehicles through teleoperation. Williams et al. (2023) presented a framework for active haptic guidance 
in mixed reality using one or more robotic haptic proxies to influence user behavior and deliver a safer and more 
immersive virtual experience. 


The primary focus of the mentioned studies was to improve the understanding of a process for training (e.g., the 
process of assembling building components). The studies have incorporated haptic guidance into the assembly 
processes, which helped users receive tactile feedback during the assembly tasks. However, the haptic guidance 
implementations fell short in providing spatial awareness and addressing high-order cognition in cognitively 
demanding tasks such as identification of the dependencies or hierarchy of building components for planning. 
While the haptic guidance aids in recognizing information about movements in training tasks through haptic 
feedback, the approaches do not offer a comprehensive understanding of the entire spatial context or 
interconnections between various building components. The presented study aims to overcome these limitations 
by exploring spatial-temporal cognitive abilities using visual and haptic stimuli. 


Haptic feedback 


Using electronic devices, we encounter multiple interactions, including sounds, flashes, and buzzing haptics 
(Müller, 2020). Such a combination of sensory stimuli allows the user to be fully engaged in the experience, which 
enriches the overall quality of the interaction. A crucial aspect of this set is haptic feedback, which draws from the 
psychological nature of interaction with the environment and other humans (e.g., social touch). Therefore, 
achieving precise replication of haptic signals in devices requires a deep comprehension of how humans perceive 
and attribute meaning to tactile interactions to portray their semantics accurately. 


The human skin’s discriminative ability arises from a dense network of cutaneous receptors allowing us to 
differentiate fine touch, pressure, texture, and temperature (Fulkerson, 2020). This adaptability of touch perception, 
known as adaptation rate, enables us to prioritize novel sensations while filtering out constant stimuli. Unlike some 
other senses perceived passively, haptic perception is inherently interactive and bidirectional — we actively explore 
and manipulate the environment to extract tactile details. 


To recreate physical sensations, HCI incorporates various types of haptic technology, including force, vibrotactile, 
ultrasonic, thermal, and other forms of haptic feedback (Hatzfeld et al., 2015). Haptic interfaces allow users to 
experience tactile sensations while manipulating objects, discriminating textures, and applying forces in the virtual 
and physical environment. 


According to the literature (Adilkhanov et al., 2022), haptics performs three primary functions such as simulation, 
teleoperation, and guidance. Through simulations, haptic feedback imitates physical interaction with the 
environment and its attributes to heighten the realism of learning scenarios. In teleoperation, the haptic interface 
provides a two-way communication channel between a robot and an operator, allowing the operator to perceive 
tactile feedback from the robotic tool (Luo et al., 2019). As part of the guidance process, haptics implement tactile 
patterns to derive directional cues to the user (Huang et al., 2019). 
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Guiding haptics becomes especially beneficial for facilitating the decision-making process and fostering problem- 
solving abilities by providing tactile cues to assist users in performing tasks or enhancing interactions in a physical 
and virtual environment (Bluteau et al., 2008; Feygin et al., 2002). This haptic function utilizes touch-based 
sensations to provide the users with real-time information, helping them make informed decisions and improving 
their overall performance and understanding of the context. Research suggests that using intuitive haptic guidance 
to assist the movement reduces errors (Mugge et al., 2016). Moreover, a partial-then-full haptic guidance strategy 
seems the most effective in improving learning outcomes (Teranishi et al., 2018). The most common applications 
of guiding haptics include vibrotactile feedback, often incorporated into commercial smartwatches for haptic 
notifications and alerts. 


Haptic guidance can be achieved through a haptic code that utilizes touch-based symbols (e.g., haptic icons or 
“hapticons” (Enriquez & MacLean, 2003)) to instantly deliver information to the user via vibrations, pressure, or 
movement (Hatzfeld et al., 2015, p. 75). According to Enriquez et al. (2006), haptic code has to meet the following 
conditions in order to offer explicit meaning: 
e Differentiable: All haptics must be distinct from one another when presented either alone or in any 
common haptic combinations. 
e Identifiable: Once a meaning has been connected to a stimulus to form an icon, it must be simple to recall. 
e Learnable: The associations between meanings and stimuli should be intuitive and easily remembered. 
The elementary functions of the haptic code include providing notifications with neutral feedback and 
signals with either positive or negative meaning in response to the user’s actions. 


Haptic code can be applied even on a broader spectrum, e.g., for rendering abstract models or concepts as a new 
modality for communication. At the lowest level, haptic devices notify users of an event, their identity, or their 
current state or contents. A higher level of abstraction implies haptic associations that allow the users to identify 
interdependencies and determine a sequence of actions by assigning physical sensations to an object hierarchy. 
Accordingly, systematic, perceptually guided haptic design can support expressive and nuanced communication 
that qualifies as a new haptic language. 


3. METHODOLOGY AND APPROACH 


The study consists of two main phases: (1) the creation of the experimental training platform, designed to be 
interactive and informative; (2) experimentation, with active student participation for practical application and 
assessment of the learning outcomes. This comprehensive approach provides an effective and engaging program 
for students to develop their skills and comprehend complex building concepts in a virtual and immersive 
environment. The presented research is the first phase, including a case example to illustrate the approach. 


Immersive virtual platform 


The VR design consists of the development of a VR environment based on the detailed design of a building project 
(e.g., a small residential building). See Fig 1. The design was represented in a Building Information Model (BIM) 
with at least a Level of Detail 300. The BIM model contains rich data on engineering systems through represented 
objects or component assemblies, such as quantity, size, shape, location, and orientation. The design was exported 
as an Industry Foundation Class (IFC) file to preserve the semantic information of the building components. The 
exported model was then imported into Unity for two purposes. First, it acts as a reference point in the form of a 
translucent building, allowing the user to place building components accurately. Second, it is semantically broken 
down into corresponding building components to build game objects. The resulting structures were game objects 
created based on the standard categorization of the building into Sub-Structure and Superstructure and further 
classified into Structural, Architectural, and Mechanical components. 


It is critical to note that the created game objects were set for true building scale, generating an immersion that 
represents dimensions for easy manipulation in the VR environment. Each game object had a representation that 
described data and text information in a structured format, involving attributes as game elements based on IFC 
structure. For example, each object game had data related to the activity (used for planning) in their element 
attributes (element descriptors). The element attributes held in addition to the planning activity information 
associated with unique haptic feedback, as discussed in the section below. For a logical representation in planning, 
game objects were nested based on a work breakdown structure (WBS)— a hierarchical tree structure subdividing 
the deliverables and work. The WBS disciplines will deliver the work specified in each work package—the lowest 
level in the WBS that represents a specific amount of work. The work package as product and deliverable has a 
VR object representation. The structure of these components is shown in Fig. 1. 
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Fig. 1: Design structure of the VR platform. 


Experimental Platform 


Unity software integrated with a VR headset Oculus Pro and a full set of haptics devices (see 2) were used for the 
development. This state-of-the-art platform provides users with a fully immersive experience. An example of the 
visualizations is shown in Fig. 5a and Fig. 5b. They illustrate a dashboard and virtual design components where 
users learn virtual manipulation, featuring an informative activity pane to hold building components as activity 
tiles and servicing as a comprehensive reference model for planning activities to enhance the overall learning 
experience. 


Fig. 2: Haptic devices for the VR platform. 
Haptic (vibrotactile) code 


The researchers systematically structured the haptic code as feedback for the simulation and experimentation in 
VR environment. The code contains the logical patterns that guide the user’s manipulation of the building 
components through interaction augmented by haptic feedback. The code has signatures expressed as haptic icons, 
i.e., a haptic icon is a brief haptic stimulus associated with meanings. The haptic icons were designed to intuitively 
comprehend cues about a function of the object and interact (user-object effects) in the virtual environment. The 
code is a form of primary language wherein each icon is a constant pattern with associated semantics. The learners 
(users) are required to get familiarized with the code (akin to learning a primary language to operate a system) a- 


priory. 
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To associate semantics to the haptic code, four key perceiving haptic features play a crucial role in defining the 
tactile experience: 

e Intensity. It governs the strength or magnitude of the tactile sensation delivered to the user. It determines 
how strong or weak the haptic feedback feels, allowing for the creation of subtle or intense tactile 
perceptions. 

e Sharpness. It relates to the perceived abruptness or distinctness of the haptic sensation. It influences 
whether the sensation feels smooth or sudden. 


e Duration. It refers to the length of time of the perception of haptic feedback. Short durations can convey 
quick events, while longer durations can simulate prolonged interactions or sustained sensations. 


e  Granularity. It is determined by the frequency of impulses and their spacing. The more granularity, the 
more rapid the impulses. 


Manipulations with these haptic features enable prototyping and fine-tuning haptic experiences to match specific 
interactions and simulation scenarios, enhancing user engagement and immersion in virtual environments. The 
combinations of the haptic features assigned to a haptic device evolve into distinctive haptic patterns. 


Haptic code (vibrotactile) types 


The haptic code consists of two types of haptic feedback: operational and functional. 
Operational 


It refers to haptic feedback of the basic human-computer interaction (HCI) with the elements of the virtual 
environment, such as feedback on actions on the system components (to select, cancel, move, etc.). The approach 
includes three types of operational haptic feedback: 
e Positive to reflect the correct actions of the user by giving soft impulses with low or medium intensity; 
e Negative to associate the mistakes and has more even rigid impulses, medium or high intensity; 
e Neutral to provide alerts to the user regarding updates or notifications (it is presented as a row of short 
impulses with gaps in between). 


Functional 
It refers to the feedback that gives semantics associated with activity planned in VR deployment. 


Parameters of duration (D), granularity (G), intensity (I), and sharpness (S) define the functional haptic code. The 
combination of parameters defines features that indicate semantics. The combination can be represented in a two- 
dimensional matrix of n rows (where n is the number of combinations). See Fig. 3. Each row represents the 
distribution of values of parameters (D, G, I, S). 


A VR object will have an associated haptic code combination (DGIS), representing a specific value and semantics. 


Fig. 1 illustrates the approach conceptualization of the intersecting components (virtual environment, structure of 
VR objects, haptic (vibrotactile) code, semantics haptic feedback (as semantics), and the spatial temporal cognitive 
ability (while interacting with problem solving in CEM). The arrangement impacts the spatial-temporal cognitive 
abilities of learners, assisting them in accurately defining the sequence of activities through the integration of 
visual and tactile cues. 
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Fig. 1: Approach conceptualization. 


Case Example 


Learners are required to plan the construction of a small building design section in the VR environment (see Fig. 
2). The user should build the plan by identifying construction work packets (associated components and activities). 
A work package (construction product deliverable) serves to establish a coherent and feasible subdivision of tasks 
within the construction project. Each packet has associations with physical areas (work zones) to cover all the 
components of the design. 


A work breakdown structure (WBS) that incorporates the components and activities associated with the small 
building design (see Fig. 2a) is presented as a dashboard in the VR environment (see Fig. 5a). The WBS is used as 
a baseline for planning. The first milestone is set for substructure completion of the building design, and the second 
is set for the superstructure (see Fig. 2b). Each building component from the design is the deliverable ofan activity. 


The assembly sequence for each activity and packet (construction product deliverable) is based on the Finish-To- 
Start (FS) inference (logical relationship between two activities). A finish-to-start relationship implies that the 
predecessor activity needs to be finished before any subsequent actions can start. 
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Fig. 2: Construction product deliverables from work breakdown structure (WBS). 
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After the user selects the packet from the dashboard in the virtual environment (see Fig. 5a), the next step is to 
select, drag, and drop the design component (deliverable) of the small building on a virtual layout by performing 
a virtual walkthrough (see Fig. 5b). 


(a) A dashboard servicing as a reference model to visualize (b) Snapshot of the mapping between the work package and the 
WBS packages for conceptual planning. reference model (virtual layout of the building). 


Fig. 5: Interactive haptic activity for a planning task in the VR environment. 


The order of activities corresponds to the deliverable sequence, and each deliverable has an associated location in 
the virtual layout. The assumption for the case example is that the presented activities depend on the completion 
of others before they can begin (FS precedence). Planning these activities takes the dependencies (precedents) into 
account by arranging activities in a logical sequence. The arrangement of all deliverables is the planning of the 
construction section of a small building design in the VR environment. 


The users need to locate (by dragging and dropping) all the deliverables of the building section in the virtual layout 
space during the virtual walkthrough. By completing all the packets in the dashboard, the user can complete the 
planning of the building. 


Haptic feedback is an interactive feature that responds to the actions of the users within the VR environment — 
i.e., certain actions generate a type of haptic feedback with associated code (meaning). When the user drags and 
drops a deliverable on its selected location, there are two potential haptic feedback: functional and operational. 
Thus, operational and functional haptics complement each other to assist with the understanding of the semantics 
and ensure proper placement of the system components. Operational haptic feedback is on basic human-technology 
interaction, while functional haptics are systematically organized and tailored to specific semantics that indicate 
hierarchical structures. 


For example, if the deliverable is placed correctly, operational haptic feedback (positive operational feedback using 
soft impulses with low or medium intensity) would indicate a code that will inform the user that the correct location 
was correctly selected. However, if it is misplaced, operational feedback with the associated code (negative 
operational feedback using rigid impulses, medium and high intensity) is given to indicate to the user the error of 
displacement. Another example of operational feedback is positive when the user reaches a designated milestone 
while finalizing the packets from the WBS. Otherwise, negative feedback is given — indicating that more 
selections are required for planning. Operational haptic (vibrotactile) is produced by the haptic sleeves, which 
offer feedback for component manipulations like selection and canceling, as well as the haptic vest and feet, which 
are responsible for delivering notifications, success signals, and failure alerts. Code examples of operational 
feedback are shown in Table 1. 
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Table 1: Operational haptic feedback code 


Events Meaning Haptic feedback Intensity Duration [ms] 

Select Medium intensity, medium sharpness, low 0.2 short 150 
A component is taken and short duration, two short impulses F 

medium 0.5 short 200 


with increasing intensity 


Cancel A component is thrown Medium intensity, medium sharpness, medium 0.4 short 100 


and short duration, two short impulses low 0.2 short 150 


with decreasing intensity 


Notification Generating another component Medium intensity, medium sharpness, medium 0.4 short 50 


two short impulses 


Error Implementation ofa component High intensity, sharp vibration, high 1.0 middle 400 
meets the constraints medium duration 

Success The component is applied Short burst of impulse, high intensity, high 0.7 short 100 
medium sharpness high 0.7 short 150 

Epic success A milestone is accomplished High intensity, medium sharpness high 0.7 Short 100 
Failure A task is failed Five short bursts of impulse with high 1.0 short 200, 
overlay, max intensity, high sharpness 250, 

300 


Functional haptic feedback provides semantics related to reasoning in problem-solving, involving analytical tasks 
for planning. Of particular interest is the user’s understanding of the relationships between design components in 
the physical space. An example of a relationship is the priority for construction, assembly, or installation of the 
design components in the physical space. Reasoning on the relationship demands spatial and temporal cognitive 
abilities (STCA). The aim of functional haptic feedback is to assist the user’s reasoning (spatial and temporal 
reasoning) when required. An example is providing a better comprehension or awareness of the order for 
construction and assembly among two or more design components—by featuring STCA— as shown in Fig. 5a 
and 5b. 
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(a) Design components (as defined in the WBS) (b) Spatial and temporal information on the design 


components: an instance of relationship among components 


Fig. 6: Spatial and temporal reasoning on design components (using STCA). 


Functional haptic feedback informs the user which elements possess the highest priority for their construction and 
assembly (i.e., some design elements have higher priority than others to make their construction feasible and 
efficient). ga, for instance, illustrates the building components— without any spatial and temporal information. 
Fig. b illustrates spatial and temporal information — the relationship among the objects in the physical space, by 
establishing the priority and order for their construction. Haptic code will help the learner to reason on spatial and 
temporal information using a combination of duration (D), granularity (G), intensity (I), and sharpness (S) features. 
For example, a combination of values from the parameters D, I, and S will inform the order distribution in a 
spectrum (e.g., from the lowest to the highest value or from the highest to the lowest value). Consequently, each 
component on the final level has its unique haptic code (DGIS) comprising values for each parameter. 


The functional haptic (vibrotactile) feedback is related to information on the hierarchy of construction activity 
sequencing. Interaction with each component is assigned with unique feedback, which allows the user to easily 
discriminate the components one from another based on their semantics by selecting them from the WBS of the 
building (Fig. 2b). Due to the perceptive haptic nature of hands, functional haptics is assigned to the haptic gloves. 
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4. CONCLUSION 


The presented study describes an exploration of new human-machine interactions to determine the effects of 
learning through the combined visual and haptic modalities in VR environments. The interactions with an 
immersive environment involve engineering design comprehension for planning activities— framed in a problem- 
solving task. The study presents the technology environment using VR and real-time haptic feedback for 
experiencing problem-solving tasks — by complementing semantics of visualizations (e.g., 3D designs) with 
haptic feedback (e.g., vibrations) for a CEM task. 


The approach to building a VR environment with dual interactive mode (visual and haptic) facilitates the creation 
of new forms of understanding problems in planning, a highly cognitively demanding task where STCA plays a 
pivotal role. Learners map VR visual and haptic features to domain (CEM) problems and build solutions to the 
planning problem. They used VR technology (headset and controllers) to engage embodied perceptuomotor 
information by interacting with visual and haptic representations. For example, users navigate the 3D design in 
VR to approach locations of interest, allowing iterations between representations and reflection while problem- 
solving. In future work with a higher number of testing subjects, it is expected to demonstrate that haptic feedback 
(haptic code) effectively informs the learners of the semantics of the components for the planning task, enabling 
the learner to infer conditions in a virtual scene. 


The technology’s pedagogical features will make design information from multiple engineering specialties readily 
available for haptic and visual perception in a stepwise process to learn planning tasks. The technology will 
facilitate learning through observation and VR movements of design components. The approach uses work packets 
(construction product deliverables) that would enable scenarios of learning about understanding deliverables as 
chunks of workload for planning—the smallest unit that can be planned and managed for construction operations. 
By enabling learning with a work packet focus, the approach facilitates understanding of planning by framing 
control into a process (set of steps for delivery) of construction (assembly). The method provides opportunities for 
the learner to assimilate complex simulated realities of the physical space and develop spatial-temporal cognitive 
ability. Spatial-temporal ability allows learners to effectively manage and comprehend significant amounts of 
spatial (how design components are related to one another in the 3D space) and temporal (the logic in a process, 
such as the order, sequences, and hierarchies of the resources within a construction task) information. 


The insights collected from this study underscore the significant potential of the VR and haptic cues to enhance 
the learners’ perception of a problem’s conditions that are not visible to the learner. Further exploration of 
technology experimentations will allow researchers to draw conclusions on the learners’ perceptual competence 
and problem-solving capabilities, thereby contributing to the formation of project engineers with high levels of 
productivity in the construction industry. 
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ENHANCING THE REALISM OF VIRTUAL CONSTRUCTION SAFETY 
TRAINING: INTEGRATION OF REAL-TIME LOCATION SYSTEMS 
FOR REAL-WORLD HAZARD SIMULATIONS 


Kilian Speiser, Kepeng Hong & Jochen Teizer 
Department of Civil and Mechanical Engineering, Technical University of Denmark, Denmark 


ABSTRACT: In numerous studies, virtual training for construction safety has been proposed as a promising 
approach. However, creating realistic training scenarios requires significant resources, encompassing various 
elements such as sound, graphics, agent behavior, and realistic hazards. Digital Twins have revolutionized this 
process, and although so far, on a conceptual level only, significantly reducing the associated workload, it is still 
not exploiting its full potential. In this work, we propose a novel approach that leverages Real-time Location 
Systems (RTLS) data to simulate the real-world behavior of construction workers and equipment within Virtual 
Training Environments (VTEs). We aim to create training scenarios with dynamic real-world instead of hardcoded 
made-up hazardous events. To achieve this, we propose an extension to our Digital Twin for Construction Safety 
(DTCS) framework that now integrates (a) trajectory data streams of construction personnel and equipment and 
(b) technical specifications of the construction site work environment, including location and geometry of terrain 
and surface objects, to simulate real-world hazards in virtual safety training scenarios. Our further contribution 
is a case study application to explore the DTCS training capacity. Applying a logical filtering algorithm, we can 
process the RTLS data and ensure that the movements of the workers and equipment within the virtual environment 
are as realistic and representative as within the real world. This then enables the creation of realistic hazards that 
trainees can encounter in the training phase. Preliminary results with trainees suggest that the proposed work can 
have a high potential to enhance the realism of safety training, especially when they need to experience human- 
machine-related interactions safely. However, further work is required to create more responsive learning 
environments where the equipment follows real trajectories but also responds intelligently to the trainees' actions. 
By leveraging real-time data and advanced visualization technologies, we bridge the gap between the physical 
and virtual realms, enabling trainees to interact and navigate within a realistic virtual environment. 


KEYWORDS: Construction Safety, Digital Twins, Education and Training, Game Engines, Internet of Things, 
Learning Environments, Mixed and Virtual Reality, Real-Time-Location System, Real-world Hazards. 


1. INTRODUCTION 


The numbers of occupational injuries and fatalities in the construction industry remain high despite significant 
investments in safety measures. Working on construction sites is one of the most fatal workplaces in the United 
States (BLS, 2022). The industry has introduced several approaches to increase safety. Generally, they can be 
separated into three categories: (1) Prevention through design and planning, (2) Right-time intervention, and (3) 
Prevention through training and education. Training in simulated environments has become more popular over the 
last few years. Among others, an advantage of virtual training is that the trainee can practice tasks in a safe 
environment without hazards where mistakes cannot lead to injuries. Several studies investigated virtual training 
for construction safety using Virtual Reality (VR) with head-mounted displays (Fang et al., 2014; Hilfert et al., 
2016; Wolf et al., 2019; Jacobsen et al., 2022; Sacks et al., 2013; Jelonek et al., 2022), or desktop-based virtual 
training (Speiser & Teizer, 2023a). More recently, Bükrü et al. (2019) and Wolf et al. (2022) developed the concept 
of Augmented Virtuality (AV) in construction safety training. Noteworthy in their study is the use of real hand- 
powered tools to generate haptic control and feedback in a virtual learning environment made for construction 
trainees and not necessarily anymore for academic student participants. These studies highlight that it requires 
significant efforts to create such training environments independently of the used technology. To ease the creation 
of the training environment, Golovina et al. (2019a) use Building Information Modelling (BIM), and our previous 
research introduced a data model for a digital twin, indicating a significant decrease in resources for generating 
the training scenes (Speiser & Teizer, 2023b). Still, most studies developed hard-coded scenarios where hazards 
are artificial. Hence, there is potential for more realism with less effort by including additional data sources and 
realistic hazards. 


Despite VR being adopted in various industries, the terminology remains ill-defined. VR is often associated with 
an immersive experience using head-mounted displays. Some studies consider desktop-based experiences as VR 
(Wang et al., 2018), while others speak of fully immersed systems that utilize a head-mounted display (Kim et al., 
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2017). Milgram et al. (1994) first introduced a continuum describing the different mixtures of reality and virtuality. 
In this continuum, VR defines an environment where only virtual elements exist. At the same time, Augmented 
Reality (AR) describes systems where most elements are real and a few are virtual. This continuum intended to 
classify Mixed Reality (MR) experiences with visual displays. However, since 1994, several studies have 
developed other kinds of displays to include haptics (Azmandian et al., 2016), multidimensional sound (Savioja 
& Svensson, 2015), and scent (Yanagida, 2012). Integrating such displays creates a new level of immersion. 


Based on the latest developments, Skarbez et al. (2021) revisited the continuum from Milgram et al. (1994) and 
proposed new indicators considering the latest developments in new technologies: (1) the Extent of World 
Knowledge (EWK), (2) IMmersion (IM), and (3) COherence (CO). The system's immersion defines the level of 
immersive feedback to actions. Skarbez et al. (2021) conclude that a fully immersive system must realistically 
respond to all human senses. EWK describes what objects are part of the virtual experience and how they are 
represented. Nowadays, the Internet of Things (IoT) can provide advanced information and replicate real elements 
in virtuality more accurately. CO indicates how consistently the system reacts to the users' intentions. Game 
engines provide functionalities such as realistic lighting or gravity to make virtual environments coherent. 


As mentioned before, creating virtual training in construction safety is time-consuming and requires realistic 
hazards. Much of the work is dedicated to developing realistic training scenarios where machines perform realistic 
tasks and move accordingly to represent realistic hazards. IoT can provide such real-world data, and game engines 
provide real-world physics to achieve coherent experiences. While previous studies have used game engines for 
creating virtual experiences, no previous work integrated real-world hazards from IoT devices. This study proposes 
a novel method for bridging this gap by streaming data from IoT devices into a Virtual Training Environment 
(VTE). The objective is to create more realistic training scenarios with hazards from real-world data. The method 
increases the EWK as well as the CO of these systems. The remainder of this paper describes the relevant research 
gap and introduces the framework integrating IoT devices. Second, a case study validates the proposed method 
using two training scenarios before summarizing the results and concluding with future work. 


2. RELATED WORK AND IDENTIFIED RESEARCH GAP 


While virtual training for construction safety has shown promise in improving workers' safety awareness (Adami 
et al., 2023), a significant research gap exists. The current state of virtual training for construction workers lacks 
the integration of real-world data from IoT devices, which hinders higher levels of realism in the training 
experiences. A literature review revealed that studies have explored the use of game engines for simulating real- 
world physics (Juang et al., 2011), BIM (Golovina & Teizer, 2022), and digital twins (Speiser & Teizer, 2023a; 
Teizer et al., 2024) to create virtual training scenarios for construction safety. These approaches have enabled the 
development of more interactive and immersive training environments, allowing trainees to practice tasks safely. 
However, to date, no previous work has integrated real-world data obtained through IoT into virtual safety training 
to expose the trainees to realistic hazards despite the potential benefits (Salinas et al., 2022; Zoleykani et al., 2023). 


Several studies on Real-Time-Location-Systems (RTLS) exist in construction as they can monitor the precise 
location of objects. Park et al. (2017) detected hazard exposure in workers using Bluetooth Low Energy (BLE), a 
technology enabling low-power communication between devices. Chae & Yoshida (2010) introduced an approach 
to prevent collisions with heavy construction machinery using radio-frequency identification (RFID). Teizer et al. 
(2008) used Ultra Wideband (UWB) for tracking construction resources and later for visualizing worker and gantry 
crane trajectory data in a first-of-a-time real-time VR learning environment for ironworker trainees (Teizer et al., 
2013). Narumi et al. (2018) stressed the applicability of Real-Time-Kinematic Global Navigation Satellite Systems 
(RTK-GNSS) for teleoperating construction equipment. 


This research bridges the research gap and unlocks crucial advantages by incorporating real-world data from RTLS 
devices into the VTEs. First, the IoT data will significantly increase the realism of the training simulations as the 
trainees virtually experience accurate information about the construction site conditions, worker locations, and 
equipment status. Second, the IoT data will enable the realistic reproduction of hazardous events for the trainee 
and, therefore, make the performance assessment more meaningful. Third, processing the data with appropriate 
algorithms will enhance the coherence. Construction sites are dynamic environments with numerous interacting 
elements. Incorporating data from IoT devices will allow the VTE to respond dynamically to changes in real-world 
conditions, thereby creating more coherent and contextually relevant training experiences. 
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In summary, the identified research gap is the lack of integrating real-world data from IoT devices into VTEs for 
construction safety. The proposed research aims to address this gap by developing a novel method that leverages 
RTLS data to enhance the realism, coherence, and practicality of virtual training scenarios, ultimately contributing 
to improved safety measures and reduced occupational injuries and fatalities in the construction industry. 


3. DIGITAL TWIN FRAMEWORK AND REAL-TIME DATA PROCESSING 


Figure 1 illustrates the proposed framework enabling real-time training for construction safety that is based on the 
DTCS proposed by Teizer et al. (2024) but focuses on components related to virtual training. 
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Fig. 1: Digital Twin framework integrating IoT data into virtual training environments. 
3.1 Input data 


The core of the framework is the VTE that utilizes the Project Status Knowledge (PSK) from the DTCS to virtually 
represent the construction site. The PSK contains the BIM model, a landscape model, the required resources, and 
a construction schedule. The PSK also encompasses a hazard library defining currently existing hazards such as 
restricted working areas, fall hazards, or moving machinery. These hazard zones geometrically describe areas 
where workers are in danger. Our previous work proposed a data model integrating hazards and safety regulations 
ina VTE (Speiser & Teizer, 2023b). The hazards can either be automatically detected by algorithms evaluating the 
PSK using safety regulations or can be modeled manually. This framework assumes that the VTE receives 
geometrically representable hazard zones. The landscape model contributes to realism by providing real 
surroundings. The requirements for the landscape model only concern the geometry for rendering purposes. For 
instance, a mesh from photogrammetry or laser scans may suffice. 


3.2 RTLS data processing 


The core novelty of this study represents the integration of real-world resources into the VTE to enhance realism 
through EWK and CO. Such resources include human workers, machinery, or materials. This work focuses on 
human workers and heavy machinery, such as wheel loaders or excavators. The representation of these resources 
in the virtual world utilizes geometrical descriptions as well as RTLS sensors to localize the resources in real-time. 
We expect the framework to function for all types of RTLS systems once the data quality is at a high level. 


RTLS data provides spatial-temporal information, which allows us to localize a resource at a timestamp. This 
information increases the EWK of the virtual environment as the virtual objects are placed at the real location. 
However, RTLS does not provide further knowledge about the state of the resource (e.g., orientation of the 
resource). Such knowledge is essential for MR experiences to generate coherent experiences. The real-time 
processing module generates knowledge about the state of a resource and simulates the motions realistically using 
technical specifications of individual resources, physics, and logic. 
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The technical specifications help to create a realistic object of the resource and load it into the MR scene. Besides 
the correct geometry, it also includes the possible motions of the resource. For instance, how does a machine steer, 
or can the machine move backward? The Real-time Processing module utilizes this information together with 
physics and logic statements to simulate the motions of the resources realistically. Physics integrates physical laws 
such as gravity or sound emissions, and the logic statements exclude irrational simulations. For instance, a railway 
wagon cannot rotate by 180° but changes direction, while a forklift may rotate by 180° without moving the location. 
The real-time processing module prioritizes realistic visualizations over the accuracy of the sensor data. We would 
rather misplace the element by a tolerance compared to the recorded data simulating abrupt motions. 


3.3 Output: Performance assessment and personalized feedback 


The performance analysis module collects data on how the trainee interacts with hazards throughout the training 
experience. Utilizing the hazard library based on safety regulations, this module constantly checks for violations 
of safety rules from the trainee. Once such violations are detected, further algorithms can evaluate the severity of 
the violation. For instance, Golovina et al. (2019b) introduced an approach to classify safety violations. To 
implement such a method in this framework, the digital twin must provide the location and geometry of hazard 
zones. Previous research has proposed virtual training for collecting such data (Golovina & Teizer, 2022). 


Once the training ends, the performance data is processed, and personalized feedback is generated. Personalized 
feedback to trainees has various benefits. Among others, learning has shown better efficiency when trainees 
understand what they did and how they can improve (Pianta et al., 2012). The feedback must summarise and assess 
the trainee's performance and graphically describe potential improvements. The feedback is also shared with the 
trainer, who can compare different performances. 


4. CASE STUDY 


To validate the proposed framework, we conducted an experiment in an infrastructure project in Munich, Germany. 
The study comprises five steps: (1) Collecting RTLS data from construction resources, (2) generating the game 
scene in Unity, (3) processing the RTLS data, (4) simulating the resources in the Unity scene using the processed 
data, and (5) evaluation of the simulation. The following sections describe how we tested the framework and 
finished with the required changes in order to provide (near) real-time training. 


4.1 Reality: RTLS data collection 


We collected the RTLS data at the staging area for a subway track replacement project in Munich. The collection 
lasted for three days during the early stage of the project. During the collection, we observed and tracked multiple 
tasks, such as unloading materials from the truck and arranging and loading materials onto rail cars. The tasks 
involved resources of both pedestrian workers and construction equipment. 


The RTK-GNSS solution was used for the location data collection as it performs accurately in outdoor 
environments. The RTK-GNSS solution consists of two components: a base station and rovers. The base station is 
placed statically in open space, and workers and equipment carry the rovers. Compared to a single GNSS solution 
whose accuracy is affected by atmospheric delays or clock errors, the RTK-GNSS uses the base station to provide 
correction for rovers so that the workers and equipment are located with cm-level accuracy (Wielgocka et al., 
2021). The accurate location information reduces the work of data processing and filtering when importing it into 
the training environment. In addition, it can cover a wider tracking area with a simple setup than other locating 
methods such as BLE or UWB. 


Given that traffic from rail cars and construction equipment occurred within a limited area, the logistics at the 
staging area could be packed and complicated. Therefore, pedestrian workers must receive sufficient realistic 
safety training to train to work in such an environment. To test the proposed framework, we recorded a task where 
two construction workers moved materials from a storage area to a rail wagon. Figure 2 illustrates the scenario: 
One worker operates a forklift, and the other worker assists the equipment operator. The work lasted for 90 minutes. 
The pedestrian worker carries an RTK-GNSS module, and the forklift has a module mounted on the roof of the 
forklift, centrally placed on top of the operator's seat. 
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C Loading area 
(==) Unloading area 


Fig. 2: The forklift moves material from the loading area to a railway wagon while a pedestrian worker assists. 


Figure 3a shows the trajectories of the two resources during the 90 minutes. The RTK-GNSS rovers streamed the 
location data to a database with a frequency of 5Hz for the machine and 1Hz for the worker. During that time, the 
worker supported the equipment operator by attaching the material bags to the forklift in the loading area and 
removing them in the unloading area. Figure 3 shows that frequent interactions between workers and equipment 
were inevitable when working simultaneously in a limited area. The 3D plot in Figure 3b visualizes the 2- 
dimensional movements of the forklift and the worker over time within the loading area to stress the close 
interaction between the forklift and the worker. This visual eases the spotting of proximity events, which entail 
that the worker was too close to the forklift. In this study, we defined too close proximity once a worker enters a 
bounding box with a one-meter distance to each side of the forklift. We ran an analysis based on an existing 
approach to detect such proximity events (Golovina et al., 2019b). The worker entered the 1m bounding box 28 
times during the time. We will use this performance as a reference for the trainees in our training scenarios, but 
the results also indicate that this framework is also applicable for safety monitoring or assessment of construction 
resources as the game engines provide large libraries for the demand of the previously mentioned applications. 
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Fig. 3: (a) 2D-trajectories of the forklift and the worker during a work task over 90 minutes with 28 proximity 
events, and (b) the trajectories, including the time-axis limited to the unloading area for 15 minutes. 


4.2 From reality to virtuality: RTLS data processing 


The collected data consists of a set of locations (x,y,z) with the corresponding timestamps. The locations refer to 
the coordinate system WGS84, a common GNSS localization system. As the Unity scene refers to ETRS89, we 
transformed the WGS84 data into ETRS89 using Pyproj (Pyproj Contributors, 2023). Based on this data, we can 
visualize the individual states for each recorded point in the Unity scene and locate it at the real-world location. 
The second component of data processing connects the individual points and moves the resource coherently with 
a realistic speed, orientation, and motions. For instance, the wheels rotate, or the axle turns once steering. Figure 
4 illustrates a problem: The trajectory from a machine implies that the machine first moved forward, then stopped, 
and returned backward. To include such logic in the framework, we need to make assumptions and technically 
convert them into an algorithm. This specific forklift steers with the rear axle, which must also be considered when 
simulating the motions. We make the following logical propositions: 


Proposition 1: The forklift only moves forward or backward and not sidewards. 

Proposition 2: The forklift changes directions if and only if it is moving. 

Proposition 3: The wheel of the forklift spins if and only if the forklift is moving. 
Proposition 4: A pedestrian worker only walks forward. 

Proposition 5: Distances of less than 10 cm between consecutive points are considered noise. 
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Fig. 4: (a) the tracked forklift, (b) the recorded trajectory, and (c) the directed trajectory after the assessment. 


Some of these assumptions are not precise. For instance, humans can walk backward or sidewards, crawl, and 
jump, but it is rather difficult to detect such behavior purely based on the trajectory of a human. Hence, we limit 
the scope to forwards walking humans. The last assumption was made after the first implementation, where the 
authors noticed that the smoothness of the motions appears faulty when considering short movements. For instance, 
the worker probably standing would constantly change direction and move by a few centimeters. The following 
algorithm integrates these logical statements and moves the resources in the correct direction at a given speed. 


Algorithm 1: Move object 
Input: Resource, CurrentPoint, NextPoint, CurrentDirection 
Distance = |(NextPoint — CurrentPoint)| 
If Distance >= 0.1: 
NextDirection = NextPoint-CurrentP oint 
Duration = NextPoint.Time — CurrentPoint.Time 
Velocity = Distance/Duration 
If Resource is Machine: 
If |AngleBetween(CurrentDirection,NextDirection)| > PI/4 
MoveBackwards(NextPoint, Velocity) 
Else MoveForwards(NextPoint, Velocity) 
10 Else MoveForwards(NextPoint, Velocity) 


OANNDNBPWNH 


The proposed algorithm moves a resource to the next point with realistic speed and rotation. If the next point is at 
least 10 centimeters from the current location, the algorithm determines the required direction and the velocity. 
Depending on whether the resource is a machine or a human, the algorithms move the resource forward or 
backward. The methods MoveForwards and MoveBackwards in lines 7, 9, and 10 implement how the resource 
behaves when moving. Practically, this means that the resource is moved in every frame according to the velocity 
and distance. The methods also implement additional animations such as rotating wheels of the machine or body 
motions for the worker (moving the legs, swinging the arms). 


4.3 Virtuality: Training scene 


The virtual environment was generated in the game engine Unity. We used Unity as it provides a vast selection of 
assets and is simpler for conceptual work, while other game engines like Unreal outperform Unity with the graphics. 
The virtual environment comprises the components illustrated in Figure 5: (1) a landscape, (2) the BIM model, (3) 
additional objects to enhance the realism of the game scene, and (4) the moving resources connected to IoT devices. 


Fig. 5: Scene components: (a) landscape, (b) BIM model, (c) site equipment, (d) human worker, and (e) forklift. 
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Google's photorealistic tiles were added to the Unity scene using the asset Cesium to visualize realistic 
surroundings. This asset allows for adding geo-referenced objects and utilizes the WGS84 coordinate system, 
which is commonly used in GNSS applications. The tiles from Google shown in Figure 5a have two disadvantages: 
First, it is outdated, and second, the quality is low. For this use case, we consider it sufficient as it is not the focus 
of this study. A frequently updated mesh generated from a laser scan or photogrammetry may increase realism and 
ensure up-to-date surroundings. In the next step, we added the BIM model in the form of an IFC file. We envision 
the digital twin to provide BIM models of high quality. In this case study, a BIM model with few elements and a 
low level of development was available. The provided BIM model contains the physical structure of the site (see 
Fig. 5). A modified version of the Unity Asset [fclmporter was used to convert the IFC file into a Unity-readable 
format. The provided IFC file references the geospatial coordinate system ETRS89. Hence, the origin of the BIM 
model was converted into WGS8¢4 to place it correctly within the landscape model. The local origin of the Unity 
scene still relates to the ETRS89 origin of the BIM model. In this way, we refer to a Cartesian coordinate system 
and no longer to the geographic coordinate system WGS84, which eases simulating the resource. In the third step, 
the scene obtained additional objects to make the environment more realistic. Based on a site visit and a site layout 
plan, we added elements such as an office trailer, safety guardrails, or material storage. The site layout plan defines, 
among others, safe paths for the workers and spaces for the machines to operate. In the last step, the tracked 
resources are added, and the movements will be simulated based on the collected trajectories and the proposed 
algorithm in the previous sections. The moving resources represent the hazard for the workers, and as they are 
following the real-world data, the simulation is more realistic. 


5. EVALUATION 


The introduction described the problem of this research: Generating realistic scenarios for construction safety 
requires realistic hazards and realistic surroundings. Our framework proposes the use of RTLS data for integrating 
scenarios based on real tasks. The framework was implemented for the described construction site and tested with 
the 90 minutes data sample from the previous section in two training scenarios: (1) a training experiment with a 
student to validate that the created hazards are more realistic, and (2) collaborative tasks where the trainee assists 
the forklift. Before describing the training scenarios, an accuracy assessment evaluated the algorithm, simulating 
the motions of the resources. 


5.1 Assessment of simulated data 


We evaluate the accuracy of the RTLS data integration based on two indicators. First, we measure the deviation 
from the simulated data to the collected data in the real world. Second, we visually assessed the simulated data 
and compared it to a video recording. 


One of the main objectives of this work was to simulate hazards realistically using RTLS data. RTK-GNSS 
provides reliable and accurate data. However, the filtering algorithm processes the raw data to visualize motions 
coherently. This can generate misplaced hazards. Hence, the filtering algorithm was evaluated by comparing both 
the virtual trajectory to the real trajectory. For collecting the virtual trajectory, a Unity script streamed the location 
of the resource with 10Hz to a database. The virtual trajectory was then compared to the real trajectory. As the real 
trajectory was collected with 5Hz and 1Hz for the forklift and the machine, respectively, the time-wise closest 
point from the virtual trajectory was compared to the real point. There is already a little error in this comparison 
as the closest point can be up to 100ms apart. With a maximum speed of 30km/h, this can contribute to inaccuracy 
of up to 8.5cm. 


Table 1 summarizes the distribution of the deviation for both the worker and the forklift. The algorithm for the 
worker provides accurate results. With a standard deviation of 5.9cm and a 99 percentile of 29cm, the performance 
is very good. However, it is important to stress that the data was collected with 1Hz. The implemented algorithm 
aims to simulate the movements towards a given point. In between these points, we do not have evidence of what 
happened. Within this second, we do not know whether the worker turned around. Thus, data should be collected 
with a higher frequency in order to evaluate the realism of the simulation better. The accuracy of the simulation 
for the forklift is more meaningful for two reasons: The real-world data was collected with a higher frequency, and 
the algorithm moves the resource on interpolated paths, which entails a higher deviation. The mean deviation 
amounts to 13cm, and the standard deviation is 27cm. The median deviation amounts to 6.8cm, and the forklift 
was at least 40cm accurate during 95% of the time. The inaccuracy of the simulation relates to the low frequency 
of the collected data points. The authors conclude that a higher 30-100Hz frequency will enable a more reliable 
simulation. 
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Table 1: Deviations between the real trajectory from the RTLS data and the virtual trajectory. 


Resource Mean Standard Median 95 percentile 99 percentile 
Forklift 13cm 27cm 6.8cm 40cm 46cm 
Worker 10cm 5.9cm 7.6cm 21cm 29cm 


The second measure to evaluate the simulation was conducted manually. We compared 10 minutes of the 
simulation to a 7-minute video recorded on-site simultaneously. The video reveals that twice during the 10 minutes, 
the forklift was simulated moving backward while it was actually driving forward. This happened during a total 
time of 28 seconds, corresponding to 6.2%. Comparing the movements of the worker compared to the simulation 
seemed realistic. 


5.2 Training Scenario 1: Simultaneous tasks 


As we mentioned previously, virtual construction safety training requires personalized feedback. Research has 
proposed methods for collecting such data and can visualize it in a concise way for construction workers. In this 
research, we use the concept of safety parameters and collect data from the trainees when entering hazards using 
automatic data collection. In the same way as before with the worker in Section 4.1, a proximity event is triggered 
once the trainee enters the bounding box around the forklift. 


In the first training scenario, the trainee must collect various objects in the training scene and return them to a 
storage area. Meanwhile, the forklift and the pedestrian worker will follow the trajectories from the real-world 
data collection. During the training, the trainee needs to ensure that they will not trigger any hazards relating to 
the forklift. Figure 6a indicates such a situation: The trainee needs to cross while the forklift passes. Should the 
trainee get too close to the forklift, a close call is triggered, and data will be collected, which is processed for 
personalized feedback. The created training scenario lasts 10 minutes, where the trainee needs to collect seven 
objects. Figure 6b shows the results. The trainee crossed the road six times while the forklift was nearby. The 
visual indicates that the worker always identified the forklift while heading east. However, returning, the trainee 
was very close to the forklift twice. This data allows us to conclude that either the equipment operator should have 
more distance to the pedestrian cross or that the trainee could not see the forklift. Figure 6b also depicts the three 
proximity events of the real-world worker during this 10-minute excerpt. 
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Fig. 6: (a) Results of the first training scene, including the landscape, BIM model, resources, and trajectories 
from IoT devices, and (b) the trainee crossing the road while the forklift follows the real-world trajectory. 


5.3 Training Scenario 2: Collaborative task 


In the second training scenario, the trainee takes over the role of the worker. Hence, the pedestrian worker is 
removed from the game, and the trainee is advised to support the equipment operator in loading the forklift. The 
trainee must wait for the forklift and assemble the boxes by pressing "C" on the keyboard. To ensure the worker is 
safe, the trainee is advised to wait in highlighted areas to not collide with the machine. Figure 7b illustrates the 
safe area, and Figure 7a shows the results from the 90-minute task. During the 90 minutes, the worker followed 
the forklift to assist in transporting the boxes. We collected the data about the proximity with the forklift using the 
1m bounding box described before. When the worker collides with the bounding box, a proximity event is triggered. 
Figure 7a shows the 37 collisions with the bounding box in yellow and highlights three actual hits in red. The 
figure indicates that the actual hits occurred not in the loading area but when the forklift was approaching the 
worker while the worker walked to the unloading area. The forklift was emitting sound, and the game engine 
increased the volume when coming closer to the sound source. Still, the worker did not avoid the equipment. 
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However, it is likely that in real-world situations, the equipment operator would have stopped. These collisions are 
not realistic and reveal one major disadvantage of this approach: The machine is following the trajectory without 
reasoning. Hence, future work needs to investigate how to include such reasoning as the machine would most 
likely avoid the machine. Comparing the virtual performance of the trainee to the real-world worker, the trainee 
triggered more proximity events. There can be different reasons, but all yellow proximity events were triggered at 
the front. A reason could be that the forklift was stubbornly following the trajectory from the data collection and 
did not interact with the worker. There was no communication possible between the forklift operator and the 
construction worker. Hence, if the worker, for instance, did not finish, yet, mounting the box, the forklift will yet 
continue on the route and almost hit the worker. The authors propose two approaches to tackling this issue. First, 
the forklift may include intelligence, such as avoiding the worker while driving or waiting for the trainee to finish 
their task before moving. Second, a multiplayer game where another trainee takes over the role of the equipment 
operator could enhance realism and improve collaboration between the workforce. 
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Fig. 7: The (a) results from Training Scenario 2 indicating the trajectories and proximity events, and (b) safe 
waiting areas where the trainee can safely wait for the return of the forklift until its standstill. 


6. CONCLUSIONS AND FUTURE WORK 


In this paper, we addressed the challenge of creating realistic virtual training scenarios for construction safety. 
Despite the advancements in virtual learning, the integration of real-world data from IoT devices was largely 
missing, hindering the level of realism regarding hazardous situations. Our proposed framework leverages RTLS 
data to enhance the extent of world knowledge and coherence of VTEs, resulting in more realistic and contextually 
relevant experiences as the hazards relate to real-world scenarios. 


Through a case study conducted at a construction site in Munich, Germany, we validated the effectiveness of our 
framework. The integration of RTLS data allowed us to accurately represent the movements of construction 
workers and equipment within the virtual learning environment for safety training purposes. The data processing 
algorithms and logical propositions ensured realistic motions of the resources, further enhancing the coherence of 
the virtual environment. Additionally, we demonstrated the practicality of the proposed method by creating a 
realistic training scenario involving hazardous interactions between construction workers and equipment. The 
study also indicates potential in creating training scenarios for collaborative tasks between humans and equipment 
based on real-world data. 


Nevertheless, the framework requires a more responsive simulation where the equipment not only follows a real 
path but can also stop and continue based on the trainees’ behavior and feedback or avoid them when having clear 
sight. It also raises further research questions, for example, whether it would be better to create multiplayer games 
for collaborative work tasks rather than making equipment follow realistic but, eventually, for human learners, 
predictable travel routes. As work is underway, expanding our framework to support multiple trainees interacting 
and collaborating within the virtual environment would foster a more dynamic and engaging training experience, 
mirroring real-world construction sites' teamwork and coordination. 


This preliminary work successfully bridged the gap in virtual training for construction safety by integrating real- 

world data from IoT devices, but there are several avenues for future improvements: RTLS data should be recorded 

at a higher frequency than 1Hz, and additional sensors on relevant static or dynamic objects in the scenery could 

further enhance the realism. For the forklift, additional sensors could detect the vehicle's orientation or the fork's 

exact location and extension. This would ease the simulation of movements. In addition, a Body Motion Suit (BMS) 
for the construction worker may provide more information on how the worker executes a specific task, adding a 

high level of perhaps needed detail of relevance to some construction hazards. 
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In conclusion, our preliminary research efforts contribute to the advancement of virtual training for construction 
safety by leveraging IoT data to create more realistic and coherent training scenarios. As we continue to explore 
and refine the proposed framework, it has the potential to significantly improve safety awareness and reduce 
occupational injuries and fatalities in the construction industry. 
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VISIBILITY ENHANCEMENT OF CRANE OPERATORS USING BIM- 
BASED DIMINISHED REALITY 
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ABSTRACT: The limited visibility experienced by crane operators in construction sites poses significant 
challenges, leading to reduced performance and safety concerns. Obstructive elements, such as existing buildings, 
construction elements, or vehicles, can block the crane operator's field of view, hindering their ability to execute 
lifting operations with precision and confidence. To address this issue, this study presents a novel approach using 
Building Information Modelling (BIM)-based diminished reality (DR) to enhance visibility by dynamically 
removing obstructive objects from the crane operator's perspective in real-time. The research employs a marker- 
based registration system that effectively aligns BIM data with the physical environment, ensuring realistic and 
precise DR visualization. Additionally, a semi-automatic selection method that involves minimal intervention from 
the user is employed to select desired objects. To generate the background, the system utilizes real-time observation 
data from occluded areas. A validation through a case study demonstrates the practical applicability of the 
developed system in real-life construction scenarios. 


KEYWORDS: Diminished Reality (DR), Augmented Reality (AR), Mixed Reality (MR), Crane, Visibility, Building 
Information Modelling (BIM), HoloLens, Construction industry. 


1. INTRODUCTION 


Construction sites are inherently hazardous environments due to the presence of heavy machinery and large 
equipment. Among these, cranes play a vital role in the construction process. Crane operators may be required to 
operate when they do not have direct visibility of the load, which is referred to as a "blind lift". This type of lift 
has been recognized by the industry as one of the most hazardous activities, as it poses a significant threat to both 
personnel and nearby property. In general, reduced visibility in the working area can lead to lower operator 
efficiency and have an adverse impact on both the end product's quality and overall productivity (Price et al., 2021). 


Diminished Reality (DR) has emerged as an effective solution for overcoming occlusions by recovering 
background scenes and giving an unobstructed view of the workspace. Meanwhile, Building Information Modeling 
(BIM), which is a digital representation of the building geometry and information (ISO, 2015), can be beneficial 
in the DR process. BIM can integrate data from various data-capture technologies, such as laser scanners, Global 
Positioning System (GPS), and imaging sensors, to provide complete data about a construction project 
(Alizadehsalehi & Yitmen, 2016). Considering these features, BIM data can be used to create a digital 
representation of the background scene that is required in the DR process. 


In this study, we investigate the implementation of BIM-based DR to enhance crane operator visibility. Our 
proposed approach aims to facilitate a safer and more efficient construction environment by providing crane 
operators with a clear and unobstructed view of their work area. By utilizing BIM data and DR technology, we 
seek to improve awareness, empower operators to make informed decisions, and to elevate safety within the 
construction industry. The integration of BIM and DR holds the potential to significantly improve crane operations 
and enhance overall productivity and safety at construction sites. 


2. RELATED WORKS 
2.1 Occlusion handling in crane operations 


Various technologies have been developed to handle occlusion and to enhance visibility for crane operators. The 
most widespread approach used by the industry is to ensure that the crane operator remains in constant radio 
communication with either a rigger or a signal person, who can provide guidance throughout the lift. However, 
these methods of communication can be unreliable and cause various accidents (Mansoor et al., 2023). Many 
solutions have been developed to overcome this limitation and improve safety and efficiency at construction sites. 
For example, a crane monitoring system is presented in (Price et al., 2021) that can provide the crane operator with 
real-time 3D visualization and the ability to give and receive feedback during blind lift tasks. In this study, the 
safety warning system is also created based on a 3D model of the crane environment. This 3D model is developed 
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in real-time utilizing sensors, cameras, and laser scanners. An alternative approach involves the visualization of 
information using transparent displays (Sitompul et al., 2020). This information, which includes important details 
such as the height and weight of the lift, is displayed through head-down displays, which are installed near the 
operator's line of sight in the cabin. However, research has indicated that operators often pay minimal attention to 
the information presented on head-down displays (Wallmyr, 2017). This is primarily due to the placement of these 
displays far from the operator's line of sight, as they are positioned in a way that avoids hindering the operators' 
view. A major drawback of using these techniques is that a user is unable to view the information from their own 
point of view. 


2.2 Augmented reality in crane operations 


Augmented Reality (AR) can be used to combine computer-generated information with the user's view of the 
environment. AR systems may give the operator real-time feedback by superimposing valuable information such 
as the load weight, distance to the target, and other crucial data on their field of view, making AR a valuable tool 
for improving visibility for the crane operator, the surrounding area, and the operation to be carried out (Sitompul 
& Wallmyr, 2019). For example, (Yang et al., 2015) developed an AR system to assist operators by providing 
visual information, such as arrows. The findings indicated that the implementation of AR support led to a 
significant reduction in task completion time, as it allowed operators to perceive the environment more clearly and 
effectively. Moreover, it minimized collision frequency and enhanced the overall user experience, demonstrating 
the usefulness of AR in familiarizing operators with new environments (Yang et al., 2015). 


Nonetheless, despite the numerous benefits of AR techniques in crane operations, there are certain limitations that 
can be overcome by Mixed Reality (MR) techniques. One of the disadvantages of AR is that it may suffer from 
limited depth perception and occlusion issues. In AR systems, virtual objects are superimposed onto the user's 
view of the real world, but they may not always appear in the correct position relative to real-world objects, leading 
to misinterpretations and potential hazards (X. Li et al., 2018). 


(H. Li et al., 2022) presents a novel application of MR technology in the form of a night hoisting assistance system, 
highlighting the potential of MR for enhancing visibility and operational safety in crane operations. This system 
enables operators to perceive and interact with a virtual model of the hoisting process in real-time. The system 
offers variety of interaction modalities, including voice interaction, gesture recognition, and gaze tracking, 
allowing operators to intuitively manage and navigate the virtual environment. 


2.3 Diminished Reality 


Diminished Reality (DR), which is an advanced visualization technology for removing or reducing the visibility 
of objects in real-time, can go a step further by visually removing obstructive objects such as buildings, trees, or 
other equipment that may obstruct the operator's view of the workspace (Mori et al., 2017). Thus, DR can provide 
new opportunities for more accurate visualization for operators of heavy machinery such as cranes. (Aromaa et al., 
2020) introduced the concept of DR for generating see-through visualization, allowing the operator to perceive the 
machine's physical structure as transparent from their viewpoint (see Fig. | (a)). Instead of making the machine's 
cabin transparent, (Palonen et al., 2017) developed an alternative method for visualizing the view in front of the 
machine using point clouds (see Fig. 1 (b)). 


Opaque boom 


(a) (b) 
Fig. 1. (a) See-through visualization of the boom presented in (Aromaa et al., 2020) , (b) Visualization of the 


environment using point cloud presented in (Palonen et al., 2017) 
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Implementing effective DR solutions for operator visibility enhancement comes with a set of challenges that 
researchers and developers need to address. The main challenge of DR is obtaining reliable and accurate 
information about the hidden background, especially in dynamic construction sites where the surroundings may 
change frequently. Since DR aims to remove or reduce the visibility of obstructive objects, it requires access to 
real-time observation data or an accurate representation of the background scene to create a seamless visualization. 
Another challenge is the precise alignment of the virtual model with the real-world environment. For effective DR 
visualization, the virtual model must be accurately registered with the physical scene to ensure a seamless blend 
between the two. Achieving precise alignment often requires robust marker-based registration methods or other 
sophisticated tracking techniques, which can be complex to implement and may require specialized hardware or 
software. Furthermore, in scenarios where the background scene is dynamic and constantly changing, maintaining 
real-time updates of the hidden background information becomes critical. The DR system must continuously 
receive and process the latest observation data to accurately reflect any changes in the environment. This real-time 
processing can place significant computational demands on the system, requiring efficient algorithms and powerful 
hardware to handle the data in a timely manner. Overcoming these challenges and creating a seamless DR 
experience for crane operators requires sophisticated data processing techniques and a good understanding of the 
specific requirements of the construction site environment. 


3. PROPOSED DR SYSTEM 


The proposed system for enhancing crane operator visibility using BIM-based diminished reality allows for the 
seamless alignment of physical and virtual scenes, enabling the visualization of occluding objects and their 
removal from the crane operator's view in an MR environment. This approach aims to enhance visibility, safety, 
and situational awareness for crane operators in real-life construction scenarios. 


Using our proposed system, the crane operator, who controls the overhead crane from the shop floor and is 
equipped with a head-mounted display, interacts with the system using hand gestures to visually remove the 
sections where obstructive objects are present. The process begins with scanning the QR code markers, followed 
by alignment of the 3D virtual model onto the physical scene. Subsequently, specific objects within the virtual 
model can be selected. Afterward, the system seamlessly integrates the real-time video feed from CCTV cameras, 
showing the dynamic real-time background and further enhancing the operator's field of vision. 


The system architecture consists of three main layers, as illustrated in Fig. 2. The first layer involves data collection. 
The BIM model provides additional contextual information, such as the physical layout of the construction site, 
the positions of obstructive objects, the dimensions, and characteristics of the crane. The laser scanning system in 
combination with the BIM model can create the initial static 3D environment map. Accurate placement of QR 
code markers in the model ensures precise registration and tracking. Subsequently, real-time data is collected from 
video streams captured by CCTV cameras placed strategically in the environment. A data integration and 
processing layer includes both the alignment and DR processing modules. In this layer, the aligned BIM model is 
integrated with the real-time observation data from the CCTV cameras. Through this integration, the dynamic 
updating of the DR visualization is achieved, ensuring a seamless and accurate representation of the background 
scene. The last layer involves the visualization of the enhanced scene in a MR environment. In our implementation, 
a Microsoft HoloLens 2 headset was utilized to present the MR visualization to the crane operators, allowing them 
to perceive the virtual and physical elements seamlessly. The visualization module provided the crane operator 
with an enhanced and contextually accurate representation of the construction site. The headset's advanced hand 
and gesture recognition capabilities enabled precise and responsive interaction with the MR environment. Crane 
operators could easily manipulate and navigate the virtual content using natural hand gestures, allowing for 
efficient and fluid control over the DR visualization. 


169 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


~ Data Integration and 


Data Collection Layer 
(Č BIM Model `} 


oint cloud Di 


Real-time Data 
DR 


\ Streaming / 


Alignment Module 


~ -QR Code Detection 
-Model Algnmen S Q 


| -Object Selection 
Object Removal 


User Interface Layer 


Interaction Module 


i 
li 
-Hand Tracking } i 


Processing Layer ' 


-Gesture Recognition 


Processnig Module 


Visualization Module 


Microsoft HoloLens 2 


q 


J L User 


Fig. 2. System architecture for enhancing crane operator visibility 


The system is implemented using C# programming in the Unity 3D environment. Unity 3D facilitates integration 
of video streams into a virtual environment. The OpenCV framework is employed to execute real-time image 
processing algorithms on the frames, enabling efficient and responsive operations. Additionally, a Wi-Fi 
connection is established between the HMD (Head Mounted Display) and the CCTV cameras. This wireless 
network connection allows seamless video streaming to the HMD, ensuring real-time visualization of the 


environment. 


Fig. 3 shows the process flow of the generated prototype system, which will be elaborated in the following 


subsections. 
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Fig. 3. DR process flow 


3.1 Virtual Model Generation 


First, the virtual model of the environment is generated using the BIM model in combination with reality capture 


techniques. The 3D scanner plays a vital role in 


reality capture by providing highly accurate and precise point 
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clouds of the environment. It captures intricate details and geometries, ensuring that the virtual model is an accurate 
representation of the real-world site. Compared to the Structure from Motion (SfM) method (utilized in (Inoue et 
al., 2018)), which heavily relies on the quality and number of images taken to reconstruct the background, the 3D 
scanner captures data directly from the environment, minimizing the dependency on photo quality and providing 
a more robust solution. Furthermore, accurate scene reconstruction can be highly challenging when applying the 
SfM method to complex construction environments. The 3D scanner, with its high precision, can handle such 
complex environments more effectively, leading to a more reliable and detailed background generation. Then, 
Autodesk 3ds Max is used to create a high-quality 3D model of the scene using the BIM model and point cloud 
data. This 3D virtual model can then be optimized and converted into a low-polygon model suitable for real-time 
rendering in game engine environments. In addition, the location of the QR code is defined in the virtual model in 
this step for the marker-based registration. The physical QR code marker is subsequently placed in the correct 
location within the environment. 


3.2 Virtual Model Alignment 


As indicated in the previous subsection, to adequately align the virtual model in the physical world, QR code 
markers are used. A Microsoft HoloLens 2 headset tracks the camera's position, ensuring accurate alignment of 
the virtual and physical scenes. Vuforia image target technology (Vuforia Enterprise Augmented Reality (AR) 
Software | PTC) plays a crucial role in achieving precise alignment between the virtual and physical scenes in this 
study. As the user scans the QR code marker using a headset equipped with the Vuforia engine, the system identifies 
the unique image target and establishes a reference point. By recognizing the image target (the QR code marker 
stored in the Vuforia database), Microsoft HoloLens gains an understanding of its position and orientation in the 
real-world environment. 


3.3 Object Selection 


Users can interact with the virtual model by selecting objects they wish to remove from their view. Upon selection, 
information about the object, including its metadata transferred from the IFC model, is displayed in the user's view. 
This interactive process allows for a more user-friendly and intuitive experience. 


3.4 Object Removal 


The process of object removal involves several steps in an MR environment. First, the system captures real-time 
video streams from CCTV cameras, which provide a view of the target environment, including obstructive objects. 
Using the interactive HoloLens interface, the operator can select a region of interest, which includes obstructive 
objects like walls. The frames captured by the CCTV camera are transmitted in real-time, accompanied by 
annotation information, including the camera's pose at the time of each frame. After any distortions are repaired, 
these frames are decoded and uploaded as textures to the GPU (Graphics Processing Unit) of the headset device, 
enabling the generation of a DR view. The image warping process is then initiated, identifying corresponding 
points between the selected region in the operator's view and the frames coming from the real-time video stream. 
By calculating a transformation matrix based on these points, the system precisely aligns the background view 
with the real-world environment from the crane operator's perspective. As a result, the selected obstructive objects 
are visually replaced with the corresponding background from the virtual model. The HoloLens application renders 
this augmented view, providing the crane operator with an unobstructed and clear representation of the 
environment. The entire process happens in real time, updating when the crane operator moves or changes their 
perspective, resulting in better awareness of the situation and informed decision-making during complex lifting 
operations. 


3.5 Visualization 


The final MR visualization, presented through the headset, seamlessly combines real-world observation data from 
CCTV cameras with the DR-processed view. Obstructive objects, previously removed using image warping, are 
no longer present in the operator's field of view, ensuring an unobstructed and clear perspective. This MR 
visualization empowers the crane operator with real-time and accurate information. 


4. CASE STUDY 


In this case study, we conducted initial steps for the validation of our developed system in a real-world setting at 
a prefabrication factory's shop floor located in Montreal, Canada. The manufacturing of prefabricated modules is 
done on the factory’s production floor, with distinct zones and the presence of cranes for material handling (see 
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Fig. 4 (a)). The type of crane used in this study is an overhead crane, which is defined by its ability to move along 
rails that are located overhead, thereby offering flexibility in the lifting and handling of material. The system's 
capabilities in improving operator visibility and safety during crane operations are the focus of this case study. The 
crane operator is equipped with a remote controller to manipulate the crane's movements and operations. The 
modules are placed in close proximity on the shop floor due to a lack of space. The operator's viewpoint is 
obstructed by this setup, which reduces their ability to see crucial parts such as the hook of the crane. In these 
situations, the operator requires the presence of additional workers near the hook to help manage the entire 
operation. The integration of the overhead crane with the proposed BIM-based DR system provides a solution to 
overcome the challenges of limited visibility faced by crane operators. As shown in Fig. 4 (c), the CCTV cameras 
were strategically placed around the module. Fig. 4 (b) shows the 3D virtual model of the prefabricated module. 


(a) (b) (c) 


Fig. 4. Experimental area; (a) Physical factory environment, (b) Virtual model of one module, and (c) Factory 
and camera settings in Unity environment. 


Point clouds of the environment are collected by Leica Cyclone REGISTER 360, as illustrated in Fig. 5 (a), (b). 
The point cloud in combination with BIM model helps us to generate a low-polygon virtual model of the scene 
(shown in Fig. 4 (b)). The process began by placing a QR code marker at the same location in the physical scene 
as in the virtual model. When the crane operator wore the HoloLens and scanned the QR code marker, the 
HoloLens accurately tracked the camera's position and orientation, ensuring precise alignment between the virtual 
and physical scenes. 


(a) (b) 


Fig. 5. Point clouds of the environment collected by Leica Cyclone REGISTER 360; (a) Scanning stations; (b) 
Point cloud data of the target module. 


Fig. 6 (b) illustrates the final result of the DR process within the HoloLens 2 environment through a screenshot of 
the user interface. In the screenshot, the actual prefabrication shop floor is displayed, and obstructive objects are 
highlighted as regions of interest. Crane operators can use the HoloLens 2's gesture recognition capabilities to 
select specific obstructive elements by drawing regions of interest around them using natural hand movements. 
Once the regions of interest are selected (red dash line in Fig. 6 (b)), the DR visualization algorithm processes the 
data in real-time to remove the obstructive objects from the operator's view. 


Fig. 6 (a) shows the operator's view prior to the application of the DR process, providing as a reference point for 
the visual change affected by DR, as shown in Fig. 6 (b). 
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Transferred view from 
CCTV cameras 


Region of Interest 
| Hook 


(a) (b) 


Fig. 6. HoloLens view, :(a) before the DR process; (b) after the DR process. 


5. DISCUSSION 


The application of BIM-based DR for enhancing crane operator visibility presents a promising solution for 
addressing challenges related to obstructed views in construction sites. The proposed system effectively combines 
advanced visualization technologies and real-time data integration to provide crane operators with a clearer and 
unobstructed view of their operational scene. 

The integration of BIM data and real-time observation from CCTV cameras improves the DR visualization's 
accuracy. This integration tackles the issue of reliable background information by providing a continually updated 
view of the surroundings via real-time video streams. This ensures that crane operators are provided with a realistic 
and up-to-date representation of the construction area. 


Despite the demonstrated effectiveness of the developed system, some limitations and challenges have been 
identified. The accuracy and reliability of the DR visualization heavily rely on the quality and availability of real- 
time observation data. In addition, factors such as changing lighting conditions and dynamic physical environment 
can influence the accuracy of tracking and registration. Future research can explore potential solutions, such as 
leveraging advanced imaging technologies or integrating cutting-edge technologies, including sensors and cloud 
solutions. For example, sensors such as LiDAR (Light Detection and Ranging) can be used to create detailed 3D 
maps of the environment to help operators in navigating complex environments, detecting obstacles, and 
improving situational awareness. Position sensors, such as GPS (Global Positioning System), can precisely track 
the crane's location. Cloud solutions can also be used for data storage and accessibility, providing a centralised and 
secure repository for storing large amounts of sensor data such as photos, videos, and sensor readings. In addition, 
cloud-based analytics tools can process sensor data in real-time, providing valuable insights to crane operators. By 
overcoming these challenges, we can further enhance the precision and reliability of the system, opening up new 
possibilities for improved crane operator visibility and safety at construction sites. 


6. CONCLUSION 


This research investigated a BIM-based DR approach to enhance crane operator visibility and safety at 
construction sites. By dynamically removing obstructive objects in real-time, the proposed system offers crane 
operators an unobstructed view of the construction scene, significantly improving their visibility and decision- 
making. The seamless integration of BIM data and real-time observation data enables a realistic and accurate DR 
visualisation. While our developed system shows promising results, further investigation is needed to address 
limitations such as the quality of real-time observation data and challenges related to registration and tracking. 
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ADAPTING BIM-BASED AR POSITIONING TECHNIQUES TO THE 
CONSTRUCTION SITE 


Khalid Amin, Grant Mills, Duncan Wilson & Karim Farghaly 
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ABSTRACT: While Building Information Modelling (BIM) can support the management and visualisation of 
construction projects, Augmented Reality (AR) holds great promise to enhance interaction with these complex 
models. The accurate positioning of BIM-AR models in construction sites is critical to ensure that the virtual and 
real-world environments are correctly aligned. Through a literature review, this paper presents a review of state- 
of-the-art positioning techniques. It explores the different techniques used to position BIM-AR models and 
understands the interconnections and differences between them, with an emphasis on their applicability to the 
construction industry. The review also explores the challenges and limitations of each technique, in terms of the 
trade-offs between accuracy, computational efficiency, and robustness in varying environments. By providing an 
overview of positioning techniques in BIM-AR, this paper aims to guide researchers and practitioners in assessing 
the suitability of these techniques in the context of construction sites. The insights gained from this review may 
inform the development of efficient BIM-AR platforms that are more aligned with the dynamic and complex nature 
of construction sites. 


KEYWORDS: BIM, Augmented Reality, Positioning 


1. INTRODUCTION 


The construction industry is constantly looking for innovations and new methods to improve collaboration and 
productivity (Schiavi et al., 2022). Building Information Modelling (BIM) is a recent innovation in information 
systems that has proven its value for the construction industry. Currently, the use of BIM is a common practice in 
the built environment (Amin & Abanda, 2019), although its use in the construction stage is limited (Nassereddine 
et al., 2022; Sidani et al., 2021). Recent advancements in immersive visualisation technologies have created new 
prospects for exploring the potential of site-based BIM settings. The research on immersive visualisation 
technologies such as Augmented Reality (AR) and Virtual Reality (VR) has been growing with the aim of 
improving collaboration, productivity, and output quality of construction projects (Schiavi et al., 2022). In 
particular, the integration between BIM and AR can bridge the gap between the site and the office by enabling 
access to the BIM models onsite (Schiavi et al., 2022; Sidani et al., 2021). The potential of BIM-based AR 
integration is attributed to the potential improvements in collaboration and onsite information retrieval and 
representation (Wang et al., 2014). However, the majority of studies tend to investigate general AR applications 
that do not depend on the utilisation of BIM. The implementation of BIM-AR depends on complex software 
architecture and sophisticated positioning techniques which are not necessarily needed in general AR applications 
(Amin et al., 2023). The focus on BIM-AR should provide a deeper understanding of the specific benefits and 
limitations of the technology in the context of the practical requirements of real-life situations. 


In the context of BIM-AR in the construction stage, the accurate positioning of 3D models onsite remains a major 
challenge and remains one of the most active research subjects (Azuma et al., 2001; Servières et al., 2021; Van 
Krevelen & Poelman, 2010). Positioning refers to the system’s ability to accurately localise and track the BIM 
model with the proper alignment, orientation, and elevation (Amin et al., 2023). Numerous studies have explored 
a multitude of positioning techniques, employing different hardware and software components (Nee & Ong, 2023). 
The choice of the suitable technique is usually driven by a trade-off between accuracy, computational efficiency, 
and the region of space in which the system should work properly (Rolland et al., 2001; Servières et al., 2021). To 
decide whether a specific positioning technique is more effective for a specific use case, it is important to have a 
global understanding of the enabling technologies of positioning. In addition, it is important to explore how the 
effective management of positioning BIM-AR models in construction sites can have implications on the existing 
responsibilities and skillset of existing BIM roles. Hence, we adopt a literature review to survey state-of-the-art 
positioning techniques in the construction stage and develop a better understanding of their uses and limitations in 
the context of construction sites. 


The motivation is to gain a comprehensive overview of the various positioning techniques in BIM-AR to capture 
the nuances and interconnections between them. This should help better understand their uses and limitations in 
the context of the dynamic and complex nature of construction sites. Such an understanding should provide insights 
into the development of BIM-AR platforms that are tailored to meet the demands of such challenging environments. 
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In addition, we discuss how the effective management of positioning BIM-AR models in construction sites requires 
revisiting the existing structure of BIM roles including prospects for new responsibilities and skillsets. 


2. LITERATURE REVIEW 


AR encompasses a wide range of positioning technologies and methods, each with its own set of intricacies and 
considerations. Capturing the subtle differences and interconnections of these techniques requires a global 
understanding of Computer Vision, sensor technologies, and algorithm development. Positional tracking systems 
can be grouped under Outside-In and Inside-Out (Gourlay & Held, 2017). Outside-In systems utilise external 
stationary sensors or cameras (trackers) to track feature points -such as light emitters- that are mounted or 
assembled into the tracked device (Gourlay & Held, 2017; Pustka et al., 2012). The main drawback of Outside-In 
systems is that the accuracy and stability of tracking are limited by the space the trackers can cover (Gourlay & 
Held, 2017). On the other hand, Inside-Out, the system uses the cameras and sensors that are assembled in the 
device to map the environment and estimate its local pose (Figure 1). It is believed that Inside-Out systems have 
an advantage over Outside-In in AR because the former requires less environmental setup and enables a more 
dynamic experience (Gourlay & Held, 2017). Inside-Out is the dominant positioning technique used in 
smartphones and modern AR headsets. However, grouping positioning techniques of AR into inside-out and 
outside-in only partially describes the vast pool of approaches, hardware and software components used. It is more 
common to group AR positioning techniques under three categories based on the type of sensors: sensor-based, 
vision-based and hybrid (Amin et al., 2023). This classification is adopted by others (Rolland et al., 2001; Zhou et 
al., 2008), however, positioning techniques need to be understood with a wider BIM-AR function focus (Amin et 
al., 2023). Williams et al. (2014) provide an important application case study, which we update and extend in this 
study. This study expands on the systematic literature review in Amin et al. (2023) to develop and update a 
comprehensive map of BIM-AR positioning techniques in the construction stage. We provide a detailed description 
of each category and develop an understanding of the interconnections and differences between them in the context 
of the nature and requirements of construction sites. 


Figure 1 Inside-out systems rely on a group of cameras and/or sensors manufactured into the headsets. These 
systems do not depend on any external sensory information. 


3. RESULTS 
3.1 BIM-AR Positioning Techniques 


Regardless of the selected technique, any BIM-AR positioning system will need to do two tasks: estimate the local 
pose (location and orientation) of the user and construct a map of the surrounding environment (Wang et al., 2014). 
This happens through a two-stage process: a learning stage and a tracking stage. The learning stage comprises 
understanding the surrounding environment and recognising its features to create a spatial map that serves as a 
foundation for accurate tracking (Nee & Ong, 2023). The tracking stage is where the system initialises the 
coordinate system, localises the model in six degrees of freedom (6DOF), and monitors the changes to its location 
and orientation relative to the environment (Choi & Park, 2021; Zhou et al., 2008). To achieve accurate positioning 
of BIM-AR models, many techniques and approaches have been developed. Positioning techniques in BIM-AR 
can be grouped under three categories: sensor-based, vision-based and hybrid (Azuma et al., 2001; Billinghurst et 
al., 2015; Palmarini et al., 2018; Servières et al., 2021). Manual mapping is an additional technique that is not 
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frequently mentioned in the literature due to its limitations (Amin et al., 2023). The dependency map in Figure 3 
shows the different positioning techniques used in BIM-AR and the interconnections among them. The next 
subsections discuss how the technology works in each category in detail and describe the associated techniques 
and their limitations. 


3.1.1 Sensor-based systems 


Sensor-based tracking refers to the process of determining the position and orientation of a user or device within 
a real-world environment by utilizing various sensors (Rolland et al., 2001, Williams et al., 2014). Compared to 
vision-based tracking methods, sensor-based tracking is faster and more robust in determining the pose of the 
device, however, they are analogous to open-loop systems whose output could accumulate errors (Zhou et al., 
2008). Several types of sensors are commonly used in sensor-based tracking in BIM-AR: 


1. Inertial Sensors: Inertial sensors are commonly used in tracking systems for BIM-AR, usually as 
complementary sensors to visual ones. The most common inertial sensor used for pose estimation is 
Inertial Measurement Unit (IMU) (Nee and Ong, 2023). IMUs can provide accurate information for all 
six degrees of freedom about the pose of the device, usually by fusing information from integrated 
gyroscope, accelerometer and magnetometer (Ahmad et al., 2013). While inertial sensors provide high 
accuracy in short-term tracking, they suffer from error accumulation over time and need to be combined 
with other sensors for accurate tracking (Rolland et al., 2001, Williams et al., 2014, Nee and Ong, 2023). 
IMUs have become an essential component in all smartphones and modern AR headsets. 


2. Laser-based Depth Sensors: also referred to as optical sensors, utilise different kinds of light in the 
infrared spectrum to measure depth information to understand the geometry of objects and surfaces in 
real-time, also known as depth sensors. Among several types of depth sensors, the Time-of-Flight (ToF) 
and Light Detection and Ranging (LiDAR) are the most commonly used techniques in BIM-AR (Amin 
et al., 2023). A ToF laser sensor emits an infrared laser beam and measures the time it takes to reflect 
back to measure the distance to an object (Rolland et al., 2001, Williams et al., 2014). LiDAR scanners 
are usually more expensive because they can cover larger areas and provide higher accuracy. 


3. GPS: is a widely used sensor for outdoor localization. It leverages a network of satellites to determine 
the latitude, longitude, and sometimes altitude of a device. GPS enables location-based AR where digital 
elements are superimposed based on where the user stands as in Williams et al. (2014). Very few studies 
have utilised GPS to position BIM-AR models (Fenais et al., 2018; Williams et al., 2014) due to its low 
accuracy and low performance indoors. 


4. Wireless Network Sensing: such as Wi-Fi or Bluetooth. They can be utilised for determining the location 
of the device but have significant limitations related to setup, coverage and accuracy (Craig, 2013, 
Williams et al., 2014). A single study utilised Wi-Fi for BIM-AR positioning (Degani et al., 2019). 


Other sensors that are frequently mentioned in general AR literature but are not used in BIM-AR in the construction 
stage are magnetic sensors and acoustic sensors. Magnetic Sensors, also known as magnetometers, detect changes 
in the Earth's magnetic field to determine the orientation of the device. Due to the existence of magnetic fields 
apart from the earth's soft iron effect and temperature changes, magnetometers are highly susceptible to magnetic 
disturbances. As a result, their use is often disregarded in various applications, particularly in industrial settings as 
construction projects (Rolland et al., 2001, Nee and Ong, 2023). Acoustic Sensors utilise the principle of ToF used 
in optical sensors but use sound waves instead of laser beams. The speed of sound varies with environmental 
conditions and sound waves can be easily obstructed, so acoustic sensors are not a reliable tracking technique 
(Rolland et al., 2001, Nee and Ong, 2023). It is argued that the sole reliance on sensor-based techniques would 
introduce significant error variables (Craig, 2013). This is due to some requirements that are not always available 
on construction sites such as network coverage, and due to their sensitivity to some environmental conditions such 
as temperature, humidity, and noise. In addition, measurement errors are accumulated over time and need 
continuous calibration because pose estimation is evaluated based on the previous position (Craig, 2013). 


3.1.2 Vision-based systems 


Vision-based techniques rely on different computer vision methods to locate and track targets within a video 
sequence or a series of images (Jinyu et al., 2019; Serviéres et al., 2021; Williams et al., 2014; Zhou et al., 2008). 
A major advantage of vision-based techniques is that they rely on cameras which provide an affordable solution 
to capture lots of information, in addition to being available in many forms and types (Song & Kook, 2022; Yang 
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et al., 2023). Vision-based tracking techniques use image processing methods to estimate the pose of the camera 
relative to the real world and so are analogous to closed-loop systems which correct errors dynamically (Zhou et 
al., 2008). However, they suffer at higher movement speeds and are dependent on the uncontrolled condition of 
the environment they are operating in such as scene complexity, lighting, and weather (Servières et al., 2021; Yu 
et al., 2016; Zhou et al., 2008). In addition, because vision-based techniques rely on recognising and tracking 
visual cues within the surroundings, it becomes challenging in an unknown environment as the system takes more 
time to collect enough data to analyse the surroundings to deduce the user’s pose (Servières et al., 2021; Siltanen, 
2012). To overcome this challenge, predefined signs (markers) that are easily detectable by the visual tracking 
system can be placed in the environment. This approach is called marker-based, where the marker is used as the 
reference for the positioning system to superimpose the virtual objects onto the real world (Nee & Ong, 2023; 
Siltanen, 2012; Williams et al., 2014). However, a major drawback of marker-based techniques is that tracking 
requires the markers to be always visible in the coverage area of the camera for a stable experience (Nee & Ong, 
2023). A multi-marker tracking method that involves distributing a group of markers for the camera to detect to 
expand the coverage area has been developed. Yet, marker-based approaches are generally considered obtrusive 
and can be easily obstructed in construction sites (Song & Kook, 2022; Wang et al., 2013). 


To overcome the limitations of marker-based tracking, markerless tracking can recognise and track natural features 
in the environment such as edges, corners, and textures, and use them as location references to overlay virtual 
elements and determine the pose of the user (Nee & Ong, 2023; Servières et al., 2021; Siltanen, 2012). Markerless 
tracking offers several advantages, such as flexibility and ease of use since it doesn't require physical markers. 
However, because the term “marker-based” implies the need to do some preparations in the environment before 
initialising the coordinate system, the term “markerless” implies a misleading perception that it can work anywhere 
without previous preparations. Globally, markerless tracking is often perceived as a universally capable technology, 
however, its practical implementation presents numerous challenges in terms of computational requirements, noise 
reduction, and user-friendly interaction techniques (Servières et al., 2021). The majority of research in BIM-AR 
adopts either a marker-based or a markerless approach, overlooking “extended tracking” approaches which allow 
tracking of digital elements to persist in the user’s field of view even when the initial target is no longer in the 
frame of the camera (Vuforia.Com, 2023; Wikitude, 2023) 


Differences among vision-based techniques can then be divided into model-based and Visual SLAM 
(Simultaneous Localisation and Mapping), also referred to as V-SLAM (Figure 3). While both techniques depend 
on feature tracking and matching between a series of images, the difference mainly lies in the system’s knowledge 
of what it should track. Model-based systems use pre-existing information about the environment. In other words, 
the system has prior knowledge of what it will track which can be fiducial 2D features such as images and QR 
codes (marker-based approach), or a group of edges, corners, and textures that define a 3D object (object tracking 
approach) (Palmarini et al., 2018; Siltanen, 2012). In contrast, V-SLAM systems gradually reconstruct their 
environment while tracking the user’s pose (Nee & Ong, 2023; Yang et al., 2023). V-SLAM techniques do not 
have prior knowledge of what to track, and so they continuously create information about surroundings utilising 
an “ad-hoc” visual tracking method (Palmarini et al., 2018; Serviéres et al., 2021). V-SLAM relies on principles 
from “Structure from Motion” (SfM) to create a 3D structure of an unknown environment and then expands by 
incorporating the aspect of real-time pose estimation through a set of algorithms that optimise computational 
efficiency (Nee & Ong, 2023; Yang et al., 2013). However, because the environment map is required for the pose 
estimation and vice versa the main challenge of V-SLAM approaches is the accumulation of small errors in the 
estimated poses which can lead to larger errors in the map information, etc. The development of hybrid systems, 
that fuse different kinds of sensory information, was designed to create more accurate results. 
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3.1.3 Hybrid systems 


Hybrid systems fuse information from different sensors and cameras to compensate for the limitations of each 
technique. In particular, the fusion of visual tracking systems and inertial sensors has gained significant popularity. 
Cameras excel in providing precise measurements by leveraging visual feature matching and multi-view geometry. 
However, in scenarios where image quality is compromised due to factors like rapid motion or sudden changes in 
lighting, purely visual tracking systems often encounter failures (Nee & Ong, 2023; Williams et al., 2014). In 
contrast, inertial sensors remain unaffected by image quality issues and demonstrate particular proficiency in 
tracking high-frequency, fast motion. Nevertheless, the measurements obtained from an IMU are subject to high 
noise levels and drift over time (Nee & Ong, 2023; Zhou et al., 2008). By fusing visual information from cameras 
and inertial information from IMUs, the system is more able to dynamically correct errors and provide more 
accurate results in constructing a 3D map of the environment while estimating the pose of the device. This is 
known as Visual-Inertial SLAM (VI-SLAM) which is the technology used for fusing the information from the 
different sensors for environment mapping and 3D pose estimation (Jinyu et al., 2019). VI-SLAM can be 
considered a subset of multi-sensor fusion techniques. Multi-sensor fusion is a broader concept that encompasses 
the integration of data from multiple sensors, which can include cameras, inertial sensors, GPS, LiDAR and more 
(Figure 3). Challenging aspects in hybrid systems are the need to perform calibration between the cameras and 
sensors to ensure that their measurements are aligned within a shared coordinate system, a process commonly 
referred to as hand-eye calibration (Nee & Ong, 2023). 
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Figure 3 A comprehensive map of the positioning techniques used in BIM-based AR 


3.1.4 Manual Mapping 


To minimise the complexity of the dynamic real-time update of user movement relative to the surrounding 
environment, some studies used manual mapping techniques. Manual mapping involves using matrices to create 
a relation between world space and camera space and use this relation to overlay digital elements on physical 
objects (Amin et al., 2023). The world space represents the coordinate system that corresponds to the physical 
environment where objects have specific positions and orientations. On the other hand, the camera space refers to 
the coordinate system of the camera within an AR device which captures the real-world scene (Figure 4). By 
creating a similar situation using a virtual camera and a 3D model, the positioning system is then able to properly 
overlay and align digital elements from the 3D model by converting the coordinates from the world space to the 
camera space (Amin et al., 2023; Nee & Ong, 2023). Manual mapping techniques do not provide a dynamic 
experience as the user is restricted by the location of the camera. Few studies have experimented with this 
technique as in Dai and Lu (2010), Lin et al. (2020) and Gomez-Jauregui et al. (2019). 
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Figure 4 Demonstration of the concept of manual mapping from Lin et al. (2020) 
3.2 Dominant techniques and devices 


BIM-AR is used for four main applications in the construction stage: site assistance, construction planning, 
progress tracking, and inspection. Figure 5 shows the techniques used in the mentioned applications. Various 
methods and techniques are used across applications with the exception of LiDAR. And so, the development of 
BIM-AR platforms that are capable of supporting different positioning techniques may be required. In addition, 
more research is needed on the effectiveness of these techniques from a practitioner perspective. While several 
studies have been carried out in real-world construction sites, the focus has been on the technological aspects of 
BIM-AR not on the practical implementation of the technology. 
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Figure 5 Positioning techniques used in different construction applications 


4. DISCUSSION 


Common in BIM-AR positioning is the need to map anchor points between digital models and the physical 
environment. The positioning system will always require a physical reference in the real world that can be mapped 
to a reference in the digital model. In BIM-AR applications that utilise natural features for positioning, users are 
usually asked to select a vertical wall edge and a horizontal edge representing a horizontal direction, then select 
the corresponding edges in the digital model. The same process occurs with artificial markers; the locations of the 
physical markers are mapped to digital ones in the digital mode. The dynamic nature of construction sites requires 
that the locations of these anchor points will perhaps change frequently as the project progresses. Studies that used 
marker-based indicated that in the context of construction sites, there is a considerable possibility that markers will 
get obstructed by other objects (Kwon et al., 2014; Lin et al., 2020). Studies that used natural feature tracking have 
argued that they could be obstructed due to other construction activities or disappear because site scenes keep 
changing (Lin et al., 2019; Mirshokraei et al., 2019). It is critical therefore to understand how construction activities 
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unfold and assess the accessibility of reference physical elements before selecting the most suitable positioning 
approach. The locations of anchor points need to be coordinated and updated accordingly to ensure continuous 
alignment between the BIM model and the evolving site conditions. 


The continuous coordination of anchor points between the physical site and the digital model necessitates revisiting 
the responsibilities of existing BIM roles. Existing BIM roles are usually oriented around generating and managing 
information across different stakeholders for design activities. We envision that a new responsibility will emerge 
to manage the accurate positioning of BIM-AR models onsite. This new responsibility, dedicated to BIM-AR 
coordination, will coordinate the anchor points between the site and the digital model and manage the 
communication with site personnel and their safety. Therefore, more collaboration between BIM and site 
professionals is needed to consider factors such as site constraints, logistics, and project stages. Adapting BIM- 
AR positioning techniques to the nature of construction sites is a crucial step in leveraging the technology. In 
addition, it is necessary to revisit the skills of BIM roles and workforce training programmes in the context of 
BIM-AR requirements. 


5. CONCLUSION 


We have provided insights into the positioning techniques used in BIM-based AR for construction projects. By 
exploring the different positioning techniques used in BIM-AR, their interconnections, and their differences, this 
study aimed to provide insights to researchers and practitioners for assessing the suitability of these techniques in 
a construction site context. The results of the review identified three main categories of positioning techniques: 
sensor-based, vision-based, and hybrid systems. Sensor-based systems utilise various sensors like IMUs, laser- 
based depth sensors, and GPS to track the position and orientation of the device. Vision-based techniques rely on 
computer vision methods and can be further categorised into model-based and V-SLAM approaches. Hybrid 
systems combine information from different sensors and cameras to compensate for the limitations of individual 
techniques. VI-SLAM is the core technology of multiple-sensor fusion. Additionally, manual mapping techniques 
were discussed, although they are not commonly used due to their limited dynamic capabilities. The review 
highlighted the challenges and limitations of each technique, such as accuracy, computational efficiency, and 
robustness in varying environments. It became noticeable that the choice of positioning technique depends on the 
specific requirements of construction applications, and there is no one-size-fits-all solution. Hence, we proposed 
guiding the research on BIM-AR to involve flexible positioning systems that can adopt more than one technique. 
In addition, the findings shed light on the need for continuous coordination of anchor points between the physical 
site and the digital model considering the evolving nature of construction sites. Consequently, a new responsibility, 
dedicated to BIM-AR coordination, will emerge to manage the positioning of BIM-AR models onsite and 
communicate with site personnel. This calls for more collaboration between BIM and site professionals, as well 
as revisiting the skills and training programs of existing BIM roles to accommodate BIM-AR requirements. 
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ABSTRACT: Tower and Mobile Cranes are some of the most commonly used heavy equipment in all 
construction sites, and any crane failures could lead to significant human and monetary losses. Moreover, rigging 
configuration determination is a critical task that requires the rigging crew to have significant experience and 
knowledge of various failure modes that can be encountered when performing lifting operations. However, despite 
the criticality of training riggers, there has yet to be a comprehensive tool used to train and guide inexperienced 
riggers, and hence, more practical tools are needed. This paper proposes a framework for using Virtual reality 
(VR) and simulation to train riggers to identify the optimal rigging configurations based on the lift type and the 
external conditions. Through 3D modeling, the critical components of the rigging system are modeled to accurately 
simulate the rigging system and their performance when faced with critical loading scenarios. The developed 
framework is expected to allow inexperienced riggers to identify critical failure modes and enhance construction 
operations’ overall safety performance and productivity. Furthermore, several scenarios are assessed based on 
historical evidence for rigging configuration failures, and the efficiency of the training tool is assessed through 
real-life scenarios and tests. 


KEYWORDS: Crane Operations, Lift Planning, Rigging, Safety, Training, Virtual Reality. 
1. INTRODUCTION 


Globally, construction and industry-related incidents account for significant human and material losses; as a result, 
the construction industry is considered one of the most hazardous industries for workers. According to the 
Occupational Safety and Health Administration (OSHA,2020), in 2019, 1061 worker fatalities occurred in the 
construction industry, accounting for 20% of deaths in all industries. A contributing factor to the increased number 
of incidents is the use of heavy pieces of equipment, where many fatal injuries were a result of using heavy 
equipment.” For instance, in Australia, Safe Work Australia 2019 reported that there are, on average, 240 serious 
injury claims reported from crane safety incidents”. Furthermore, the current construction trends, such as Off-Site 
construction (Lingard et al.,2021), are gaining more traction for project execution. Moreover, Off-site construction 
is heavily reliant on cranes, and thus, an increase in crane usage is inevitable. Fatal incidents would increase 
exponentially with the increased use of crane construction projects. Thus, it is critical to understand the underlying 
reasons for incidents related to crane operations and to develop the necessary tools required to mitigate the number 


of incidents. 


Much research has been conducted to understand the causes of crane incidents. (Milazzo et al ,2016) Analyzed 
937 crane incidents and identified that the leading cause of incidents for different cranes is overturning and 
collapse, mainly due to structural failures caused by overloading. Another contributing factor is human 
error (Milazzo et al., 2016) found that one-third of crane incidents were triggered by human error, resulting in 
either weight underestimation or improper operation of the crane. (Lingard et al.,2021) Identified that the main 
types of incidents in crane operations are those related to electrocutions and tip-over incidents. On one side, 
electrocution incidents result from improper planning, unsafe working conditions, and human negligence. On the 
other hand, Tip-over incidents are mainly caused by overloading, loss of center of gravity, and outrigger failure. 
Finally, (Tam and Fung,2011) identified four main factors causing safety incidents in crane operations: 
negligence and misjudgment, inadequate training, subcontracting, and pressure from deadlines. Thus, to mitigate 
crane-related on-site incidents, it is necessary to eliminate human error, avoid overloading and structural failures. 
(Zhang et al.,2023) the overturning and loss of center of gravity can be prevented by securing the lifted loads; this 
prevention is a major duty of rigging personnel whose performance can be enhanced through adequate training. 


Training has been a primary focus for crane operations, where many tools were used to provide operators, riggers, 
and signalers with the proper procedure to perform lifts. In most cases, only traditional training methods were 
used, which centered around textbooks, video tutorials, and a limited amount of hands-on experience, which is 
less immersive. (Wu et al.,2020) argues that learning by doing has been recognized as a more effective training 
method. Furthermore, they argue that traditional lecturing courses are less effective in transmitting learning 
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knowledge. Luckily, recently, with the rise of VR, more research is being conducted to understand the impact of 
VR training on construction workers. For instance, (Joshi et al., 2021) used VR technology for safety training in 
the precast/prestressed concrete industry and found that knowledge gained through traditional methods is lower 
than that gained from VR training methods. (Song et al.,2021) developed a VR crane operator training module and 
reported that VR training effectively enhances crane control skills competence. 


Crane operations and VR training procedures are scarcely represented in the literature, where only a limited set of 
works were published regarding crane-related operations and VR training. Previous works focused on using VR 
to visualize crane lifts and hazard identification through clash detection. (Shringi et al.,2022) for instance, 
developed a hazard identification training model for crane operators, which focused mainly on tower cranes. ( 
pooladvand et al, 2021) Used interactive VR to evaluate mobile crane lifts. While (Song et al, 2021) developed a 
VR crane operator training module but did not take into consideration the rigging personnel training. Regarding 
rigging, (Zhang et al., 2023) developed a collaborative training model for crane operators, riggers, and signallers. 
However, the main focus was on collaboration between the different project participants. Furthermore, for rigging, 
only a semi-immersive CAVE system was utilized for collaboration and communication between the different 
project participants taking part in operating, and guiding cranes. However, a training system which takes into 
consideration the elaborate and complex properties of rigging components is yet to be tackled by any work. 


Thus, this work identifies the need for developing a fully immersive and engaging tool to train riggers and mitigate 
on-site incidents. To do so, the authors propose a fully immersive VR-based training framework for riggers to 
tackle the main causes of crane-related on-site incidents. The proposed methodology is expected to improve the 
overall safety of crane operations as well as decrease the number of incorrect assemblies and improving the overall 
performance of crane related operations. 


2. METHODOLOGY 


The overall methodology proposed by the authors is summarized in Figure 1. 
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Fig. 1:Research Methodology 


The initial step is to identify the audience of the VR tool. Later, tool selection is vital in any VR environment; a 
set of decisions are needed to select the game engine and the VR hardware to improve the overall experience. 
Next, the main issues encountered on-site are identified to enable the development of a realistic model capable of 
accurately representing the real working environment. Moreover, to assess the training progress, a set of 
assessment criteria is needed to estimate the trained personnel's overall performance objectively. The next step is 
to build the training environment by initially creating 3D models of the rigging components using 3DSMax and 
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SolidWorks. Then, to represent the construction site, a BIM model of a real project is exported to the VR 
environment, and a set of training scenarios are selected appropriately. Finally, a set of validation tools are 
implemented to validate the methodology, and the results are discussed. 


2.1. Identify The Audience 


The primary focus of this study is to utilize the Virtual Reality (VR) training tool to teach novice and inexperienced 
riggers the best practices for safe crane operations. The tool aims to enhance the overall safety of crane operations 
by addressing critical aspects such as better rigging assemblies, identification of erroneous assemblies, secure load 
lifting, and identification of faulty equipment. Additionally, civil engineering students seeking to learn crane 
operation concepts can also benefit from this tool. 


2.1.1. Define the learning objectives. 
The learning objectives of the training module are as follows: 


e Enhancing skills in rigging assemblies: Trainees will learn how to select and assemble the appropriate 
slings, shackles, hooks, spreader bars, and bolts according to the specific load being lifted. 

e Identifying and avoiding hazards: Trainees will be educated on identifying potential hazards during 
rigging operations and how to avoid accidents by adhering to safety guidelines such OSHA. 

e Securing loads during lifting: Trainees will gain insights into load calculations, determining the center of 
gravity, and ensuring the load is securely balanced and lifted correctly. 

e Detecting faulty equipment: Trainees will learn how to inspect rigging equipment prior to assembly and 
identify any faulty components that may compromise safety. 


2.2. Tool Selection 


To develop a fully functioning and accurate training module, it is first necessary to select the necessary tool to 
develop the module. The following sections will discuss the selected Game engine,3D modeling tools and the 
hardware used throughout this work, as well as the reasoning behind selecting these tools. 


2.2.1 Game Engine and 3D modeling Tools 


As for the game engine, Unity3D was selected due to its ease of implementation and the availability of a diverse 
set of libraries and tools which can be used for simulation purposes. The 3D modeling software selected are 3DS 
max, and SolidWorks.3DSMax was used to import and design some of the necessary components of the raining 
environment. While SolidWorks was used to create 3D models for the rigging components, such as the hooks, 
spread bars, slings, and eyebolts. Both 3D software were also chosen since they are compatible with the selected 
game engines, where it is advisable to use the previously mentioned software alongside a rendering software to 
provide a more realistic and engaging training environment. 


2.2.2 Hardware Description 


As for the hardware, an HTC Vive Headset was used for training. The headset was used to allow the user to explore 
the environment. two Wireless controllers were also used, their utility lies in allowing the user to move through 
the environment and interact (grab, assemble and dissemble) the relevant rigging components. The movement of 
the player was detected using two HTC bases. The overall system was run on a supercomputer, with an I9 CPU, 
and NVIDIA RTX A2000 GPU and 32Gb of RAM. 


2.3. Rigger Tasks 


After thorough analysis of crane operators training guides such as (ATP 2013), as well as analyzing relevant 
research such as (Zhang et al, 2023) works riggers tasks were identified as follows: 


e check the rigging equipment, in other words, select the appropriate slings, shackles, hooks, spreader bars 
and bolts in accordance with the module to be lifted. 

e load calculations, and center of gravity determination of the loads to be lifted. 

e Rigging equipment inspection prior to assembly. 

e rigging assembly, by ensuring that each component is secured and respects the angular limits. 
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e Load Control by ensuring the load is balanced and lifted properly throughout the lift. Ensure the center 
of gravity and hook are aligned. 


2.4. Training Environment 


In order to accurately model a construction site, and its properties. The Unity3D engine is used to develop a realistic 
construction site as well as the user interaction with said environment. The following section will delve into detail 
and discuss both the construction site and the human interaction within the environment. 


2.4.1 Construction Site and Rigging Configuration 


A virtual construction site was created to provide the trainees with a sense of dimensions and realism when training 
which can be seen in Figure 2. The site is populated with various interactable items similar to a real construction 
site. The rigging components and other items in the virtual training environment are modeled on 3D software with 
accurate dimensions and physical properties to those used in real operations. However, the interactable items the 
trainees can interact with, and use are mainly those required to assemble a rigging system. Additional components 
that are either faulty or unneeded are added to render the tasks more challenging and, eventually, more rewarding. 
Furthermore, to enhance immersion, audio, and interactive interfaces were used to imitate a real-life learning 
environment and provide an optimized training process. 


Fig. 2: Training Environment. 


2.4.2 VR Controls 


Replicating human interaction in a virtual environment is often a complex task, specifically when the Virtual 
environment is large in scale. The main difficulty encountered is regarding the locomotion system which is related 
to the ability to roam and use to environment.to tackle this issue, two different systems of movement are used to 
replicate the movement of workers on site. The first system, which is generally used in serious desktop based 
serious games where the user inputs the direction continuously through a touchpad. The second system is through 
a teleportation system, where the user can teleport to any accessible location in the construction site. The user is 
given the option to use one or a mixture of the systems according to their preferences. However, it is preferable to 
use the second system if the user is more likely to encounter VR dizziness. 


As for the interaction system, the user can grab, assess, assemble, and disassemble any rigging components, which 
is the main feature used to create a realistic interactive training system. Finally, after assembling the rigging, the 
user can evaluate the efficiency and accuracy of their assembly by performing a lift. The lift is subjected to the 
environmental conditions present in a construction site, which are accurately replicated in the Unity environment. 


2.5. Assessment Criteria 


In order to assess the performance of the participants and the efficiency of their training, a hazard identification 
index inspired by (Dhal Mahapatra et al,2021) is used to grade and measure their performance. The scoring system 


187 


identifies the primary and secondary tasks to be executed by the participant. Each task is given a score ranging 
from a one to three scale where one is a basic task that is fundamental but is not critical to the overall safety, and 
3 is an advanced task critical to ensuring crane operations. The score system is also affected by a time factor or a 
time detection factor to put an emphasis on completing the tasks within the allotted time. Table 1 illustrates the 
different training levels and their assessment criteria. 


Table 1: Training Assessment Criteria for different. 


Level Task Description Score 
Range 
Fundamental actions and responsibilities contributing to the safety 1-3 
Level | - Basic Task: and smooth functioning of crane operations. These tasks require 
moderate skill and understanding but are not considered critical or 
highly complex. 
Level 2 - Intermediate Moderately complex actions that are more crucial to the overall 4to7 
Task safety and efficiency of the crane operations. These tasks require a 
higher level of skill, attention to detail, and decision-making. 
Level 3 - Advanced Task Critical to ensuring crane operations' utmost safety, accuracy, and 8 to 10 


effectiveness. It involves handling challenging scenarios and 
potential emergencies and making critical decisions promptly. 
These tasks demand a high level of expertise, problem-solving 

abilities, situational awareness and timely decision making. 


2.6. Framework Validation 


(Harris et al, 2020) defined validation as the extent to which a test, model, measurement, simulation, or other 
reproduction provides an accurate representation of its real life equivalent. Furthermore, (Salinas et al, 2022), 
defined evaluation methods as either objective or subjective. 


2.6.1 Objective methods 


Objective methods are those methods which evaluate efficiency based on factual data. (Salinas et al, 2022) found 
four main objective methods used in literature. For safety training, they are defined as follows: 

e safety improvement: by comparing the behavior under training with the recommended behavior defined 
by safety requirements. 

e performance time: measuring the time needed by a trainee to preform the required tasks, while also 
considering the impact of making a wrong decision. This method is used to measure the consequences of 
timely decision making on the overall on-site safety. 

e number of errors measures the number of errors committed by the participants. And for this case study, 
compare the results for different groups of participants to gain a deeper understanding the VR tools impact 
on decreasing the number of errors made. 

e measurement of vital signs: based on monitoring vital sings of trainees to understand the physical and 
psychological impact of stressful and dangerous scenarios on the overall performance of the trainees. 


2.6.2 Subjective methods 


Subjective methods are those methods which evaluate efficiency based on the trainees’ feelings and preferences. 
(Salinas et al, 2022) mentions 4 major subjective evaluation methods which are sensory user emotions, expert 
analysis, user field workload and interviews and questionnaires. 


Thus, and in order to validate the proposed framework, different validation techniques are used in the initial trials, 
both objective and subjective. As for objective methods, two validation techniques were deemed appropriate, 
which are performance time and number of errors, these two methods were selected to measure the learning of the 
trainees as well as understand the impact of timely decision making on the overall performance of crane operations. 
The rest of the objective methods were not used for this work; However, the developed model allows the inclusion 
of these methods in future studies.as for subjective methods, questionnaires and interview were used to validate 
the overall experience the users had when interacting with the training environment. 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


3. MODEL DEVELOPMENT AND TRAINING PROCEDURE 


When developing the training procedure, a natural progression from more manageable tasks to more complex tasks 
was selected to increase the efficiency of the overall training. The training procedure was inspired by relevant 
training manuals such as the ATP 2013 and crane safety-related standards. The training consists of four primary 
levels, starting from a basic introduction to the crane safety problem. Then, a components identification module 
for rigging assemblies, followed by a basic assembly and secure rigging module. The training procedure is 
finalized with an advanced assembly and simulation concluding module. The four are explained in more detail 
below. 


3.1 Basics Introduction 
Briefly explain the importance of proper crane rigging techniques and the potential risks associated with incorrect 


rigging. Emphasize the need for a comprehensive training program to ensure safe and efficient rigging operations. 
A sample of the GUI is shown in Figure 2. 


| ee 


Welcome to Rigger Trainer 1.0. — your 
ultimate destination for mastering the art 
of rigging and lifting operations! We are 
thrilled to have you embark on this 
journey of skill enhancement and 
knowledge acquisition. 
Before..Explaining. the. training.process.., 
let's first understand why rigging is 
important to crane operations! 


Fig. 3: The Introduction GUI. 
3.2 Level 1 - Component Identification 


At the outset of the VR training program, we prioritize establishing a strong foundation for trainees by focusing 
on providing a comprehensive understanding of the fundamental components that make up a rigging system. 
Through engaging 3D models and interactive hands-on exercises within the virtual environment, participants will 
delve into the intricacies of identifying key elements such as slings, shackles, hooks, and spreader bars. The VR 
experience offers a realistic and immersive environment, allowing trainees to explore lifelike representations of 
these vital components. By interacting with the 3D models, they will gain invaluable insights into the unique 
features and applications of each component. This in-depth comprehension is critical for ensuring safe and 
effective rigging practices in real-world scenarios. As trainees virtually manipulate the slings, shackles, hooks, and 
spreader bars, they will receive instant visual and auditory feedback, fostering a dynamic learning process. This 
hands-on approach within the VR environment enables them to internalize the knowledge effectively, bridging the 
gap between theory and practical application. By the end of Level 1, participants will have honed their ability to 
recognize and differentiate between various rigging components accurately. Equipped with this essential 
knowledge, they will be well-prepared to progress to Level 2, where they will practically assemble these 
components into basic rigging configurations, all while harnessing the power and potential of VR technology. 
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Fig.4 Rigging System Components 


3.3. Level 2 - Basic Assembly and Secure Rigging 


With a solid understanding of rigging components acquired in Level 1, trainees progress to Level 2, where they 
embark on the practical application of their knowledge in a virtual setting. In this stage, participants will immerse 
themselves in the VR environment to virtually practice creating various basic rigging assemblies, honing their 
skills in a risk-free and controlled space. Within the VR environment, trainees will have access to an array of 
virtual rigging components, including slings, shackles, hooks, and spreader bars, similar to what they encountered 
in Level 1 as can be seen in Figure 4. 


They will have the freedom to select and manipulate these components to create different rigging configurations, 
such as single-leg slings, double-leg slings, and bridle slings. Real-Time Feedback: As trainees assemble the virtual 
rigging configurations, the VR system will provide real-time feedback on their actions. Visual indicators and audio 
cues will ensure that they correctly balance loads and securely fasten each component, enhancing their 
understanding of proper rigging techniques. This immediate feedback mechanism reinforces safe practices and 
encourages trainees to adjust until they achieve accurate rigging setups. 


One of the feedback mechanisms used to enhance the learning process, is through highlighting the proper location 
of each component within the overall assembly as can be seen in Figure 5. This to ensure that the trainee would 
not require the aid of external factors and their learning journey would be self sufficient using the rigging training 
model only. 
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Fig. 5: Different Basic Rigging Assemblies 


To enhance engagement and challenge trainees further, the VR program will introduce interactive challenges. 
These scenarios could include lifting differently shaped loads, coordinating multiple rigging components 
simultaneously. By navigating through these challenges, participants develop a deeper understanding of rigging 
complexities and learn to adapt their skills to diverse situations. Throughout Level 2, the VR training program 
follows a progressive approach, starting with simpler rigging configurations and gradually advancing to more 
complex setups. This gradual increase in difficulty ensures that trainees build their skills step-by-step, instilling 
confidence and proficiency in rigging operations. 


Fig.6: Training Feedback Mechanism 


3.4. Level 3 - Advanced Assembly and Simulation 


Level 3 represents the pinnacle of the VR Crane Rigging Training Program, where trainees are exposed to 
advanced rigging assembly exercises and sophisticated simulations. In this stage, participants take their skills to 
new heights as they tackle complex rigging scenarios, further solidifying their expertise and decision-making 
abilities. 

In level 3, trainees are tasked with creating complete rigging modules from scratch within the virtual realm, similar 
to those shown in Figure 6. These modules involve intricate configurations and incorporate multiple components, 
without any guidance, applying what they learned in the previous levels. The VR environment provides an 
extensive library of rigging equipment, allowing trainees to experiment with various combinations to achieve the 
best setups.in some scenarios, the participants are not provided with the optimal component and are expected to 
produce rigging assemblies with the available components, preparing them for unpredictable scenarios in the field. 
An integral part of level 3 enables trainees to evaluate the structural integrity and safety of their virtual rigging 
setups. After assembling and selecting a rigging configuration, a lifting simulation is used to assess the system's 
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stability and provide immediate feedback on potential weaknesses or hazards. This invaluable feature allows 
trainees to identify and rectify errors. 


Fig. 7: Different Advanced Rigging Assemblies 


Through out the training process, the VR system records their performance metrics, allowing for detailed analysis 
and evaluation of their skills and decision-making capabilities. This data-driven approach enables trainers to 
identify areas of improvement and tailor further training to meet individual needs. 


3.4. Level 3 - Advanced Assembly and Simulation 


After developing the training procedure, a set of tests were done by the authors and other research team members 
at the University of Alberta to assess the effectiveness of the training and optimize the movement and interaction 
systems used throughout the training. Furthermore, the trainees were asked to choose which locomotion system 
was to be used when moving, and the teleportation movement system was deemed more user-friendly than 
continuous movement. Other concerns were tackled, and the overall quality was enhanced. In terms of interaction, 
some users had some difficulties getting used to the interaction system, but this issue was mitigated with more 
system usage. 


Finally, the participants expressed that realistic graphics and environment improved their understanding and 
feeling of the tasks. However, they would eventually get fatigued when exposed to longer durations of using the 
model. 


4. CONCLUSION 


In conclusion, in this work, a framework for training inexperienced riggers using fully immersive VR was 
developed; the developed framework is expected to enhance the overall safety in crane-related operations as well 
as mitigate the number of incidents on construction sites that are caused by inadequate training, inappropriate 
rigging assemblies and human error. The expected improvements result from customizable and realistic training 
procedures that follow the health and safety regulations and the relevant training standards currently being used to 
train riggers and crane operators. 


The framework performance was tested on a limited number of students and researchers with experience in crane- 
related operations. However, further plans to have a more extensive and detailed assessment of the performance 
of the training module are planned in the works to follow. 


Furthermore, Future works include using other objective methods to further assess the developed model's validity. 
These objective methods would be used to provide a detailed comparison of the behavior of workers under training 
using VR and traditional methods—moreover, the measurement of vital signs to understand the impact of stress 
on their performance. This research can also be expanded on by creating a digital twin of the construction site and 
the rigging assemblies and then assessing the validity of the rigging assembly in real-time, allowing the workers 
to validate the lift in the virtual environment and modifying the actual assembly to minimize the possibility of an 
incident. 
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ABSTRACT: Safety management in construction sites has always been one of the most sensitive aspects of the 
AECO industry and a problematic that recalls the complexity of such a multifactor domain. The high number of 
work accidents that occur on construction sites is also caused by the fact that not all the information to work safely 
is always available. For instance, visibility during some maneuvers is a key aspect of safety in operations, and this 
is often impeded due to the layout of the construction site and working methods, especially in the use of some 
equipment. The latest approaches in order to overcome complex situations is represented by the Digital Twin 
paradigm. This approach has among its main criticisms: 1) the way of connecting physical reality and its digital 
replica and 2) the system for exploiting the combination of real-time data and digital applied intelligence for 
supporting operations on site. This paper proposes a framework for the development of digital twin of the 
construction site. An application of augmented reality that exploits the concept of diminished reality and workers 
location detection will improve visibility during critical operations. 


KEYWORDS: Diminished reality; Augmented Reality; Health and Safety; Building Information Modelling; 
Digital Twin; Construction Site Management. 


1. INTRODUCTION 


Safety in construction has always been a major issue that impacts the industry in terms of efficiency and cost. 
Construction is in fact one of the most hazardous industries due to its dynamic, temporary, and decentralized nature 
(Li et al., 2015). Among all economic activities, with the exception of oil, coal, and mineral extraction, the highest 
shares of serious incidents are recorded precisely in the construction divisions, which record rates above 35 
percent. In the five-year period 2014 - 2018, compared to cases that occurred in other sectors, events in construction 
showed a higher prevalence of risk factors related to the predisposition of work environments (19% vs 12%) and 
procedures implemented by the injured (40% vs 47%) (Inail, 2022). These statistics focus our attention on two 
aspects: 1. The worksite is a place that by its conformation and nature carries a high percentage of risk; 2. The 
proper execution of procedures is a critical aspect that should be worked on to reduce its percentage of risk that is 
the highest among all (Li et al., 2015). 


The traditional approach to safety also referred to as Safety-I (Martins et al., 2022) involves working to reduce the 
number of adverse events as much as possible. This most often involves using methods to counter or prevent the 
occurrence of misbehavior (e.g., audible warning systems on machinery) or to limit the effects should they 
eventually occur (e.g., individual and collective prevention devices). This approach involves having to envision 
all possible adverse scenarios so as to provide methods to counteract possible adverse outcomes. This is a 
continuous chase that however struggles to take into account a fundamental aspect that characterizes construction 
sites: their high complexity (Fang and Wu, 2013). Difficulty in standardization of procedures, complicated 
production processes, temporary organizational structure, relationships between stakeholders, and variable 
workplaces are some of the factors that contribute to the deep diversification between construction sites and other 
industrial production sites (Li et al., 2015). All these factors provide the backdrop for another crucial aspect that 
is encompassed in the concept of complexity and that is the need to manage the unexpected, which in terms of 
safety can be translated as dealing with new, variable and real-time risks and hazards. 


In this context, the use of new technologies to collect and analyze real-time data from the construction site can be 
a disruptive innovation (Teizer et al., 2022). A push in this direction is also provided in the Italian context by Law 
36/2023, which in Annex 1.9, article 12 letter m states: “Jn the formulation of information requirements by 
contracting stations, specific uses, operational methodologies, organizational processes and technological 
solutions may be defined as objects of evaluation for the purpose of rewarding, with reference to the execution 
phase of works, to digitally increase health and safety conditions at construction sites. ” However, there are still 
unresolved issues in this regard that prevent widespread and pervasive use of these methods in the real 
environment. On the one hand, real-time on-site data collection itself encapsulates within it several problems: the 
sensorization of construction sites (workers, vehicles, materials), methods for collecting images that can be 
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exploited to apply artificial intelligence, the possible need to have to provide power in the open environment to 
IoT devices, and the lack of a unified framework where to channel and integrate the collected data. On the other 
hand, information derived from analyses, short-term simulations or application of artificial intelligence should 
increase the knowledge of the stakeholders involved and thus in the interest of safety management they should be 
made immediately available on site in order for being used by operators, safety managers and site managers. This 
aspect represents a great complexity because making content usable on site requires finding quick and effective 
methods of communication both in terms of sending the data and visualizing it at the construction site. Augmented 
reality is a technology that has great potential in terms of visualization. Indeed, the use of holograms superimposed 
on the real-world view makes it possible to add an information layer that can show aspects not visible to the human 
eye or enrich knowledge with information from analysis of previously submitted data. In some cases, however, 
what is needed is not so much to add a layer of knowledge as to remove something that impedes the view. There 
is for this purpose a specific class of augmented reality that is called diminished reality (Cheng at al., 2022). 


The purpose of this research is to apply the concept of diminished reality to construction sites in order to support 
operations with regard to risks related to operator safety. The proposed framework and the first experiments 
developed involved moving loads with a crane work scenario, including work at height, where the operator of the 
vehicle has a not fully unobstructed view. The application of see-through diminished reality allows the crane 
operator to visualize any worker placed in the vicinity of the load through the use of a depth camera and artificial 
intelligence. Innovations of the proposed system include: 1. Visualization by the operator not only of the relative 
position between workers and load but also of the orientation of the workers' bodies. This information makes it 
possible to understand whether the worker is aware that he or she is near a moving load or whether there is a risk 
of catching the worker by surprise (in case it is shown that the subject is with his or her back to the load). 2. An 
automatic procedure that transforms the visualization from solid to wireframe of objects that are opposed between 
the machine operator and human figures detected by the depth camera. In this way through a two-way line of 
communication and interaction between the actual construction site and a digital representation of it, a Digital 
Twin (DT) type approach will be achieved in which data obtained are processed and sent back, after the necessary 
analysis, directly to the site real time use. The system proposed in this article makes more sense at large, complex 
construction sites where the expected safety costs are higher and therefore allow for the implementation of new 
technologies. 


2. LITERATURE REVIEW 


The use of innovative technologies such as virtual (VR), augmented (AR) and mixed reality (MR) in the 
construction industry has found different levels of application due to the differences between them. Applications 
using augmented and mixed reality have long since found their use in construction sites. Chalhoub et al., 2018 
proposed a method for electrical system component's location visualization on site using mixed reality and head- 
mounted displays. A similar way of exploiting MR is proposed in recent times also by Dallasega et al. (2023) for 
MEP components installations in general. A number of applications of AR and MR in construction sites still refers 
to way for showing design information or building components specifications in a more straightforward and easy 
understanding way (Carbonari et al., 2022; Yoon et al, 2022; Sabzevar et al, 2023; Pendersen et al, 2020; Um et 
al., 2023). 


As far as worksite safety is concerned, however, most applications focus on the possibilities of leveraging these 
technologies for operator training. Boschè et al. (2015) proposed a novel MR system uniquely targeted for the 
training of construction trade workers. One of the aims of this paper was to enable trainees to experience 
construction site conditions, particularly being at height, in different settings. Anyway, the majority of these 
applications makes use of VR or Augmented virtuality (Wolf et al, 2022) which means that the operator is 
transported completely into the virtual world to carry out serious game-like experiences but without testing in the 
real world or with operations carried out live. Jelonek et al. (2022) developed a VR application for training 
operators in the use of cutting equipment on site introducing in the serious game procedures to follow in order to 
be sure that all the involved personnel is following all safety prescriptions. Wolf et al. (2022) developed a serious 
game for inspector job simulation in a complete virtual environment. Speiser and Teizer (2023) tried to move 
forward introducing the concept of digital twin for construction safety training in order to link various data sources 
to generate a Virtual Training Environment (VTE) automatically. However, few applications are yet attempting to 
bring these technologies on site to provide real-time assistance. Nguyen et al. (2022) started to work on skeleton 
recognition for action recognition on site but they have not developed an AR visualization on site yet. Eiris Pereira 
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et al. (2019) combined training and augmented reality but directly on site for reporting situations with danger of 
falling from height. 


One aspect that is still underemphasized in the construction world is that sometimes it can be very useful that 
information is taken away rather than added, and in this sense the application that should be considered is that of 
Diminished Reality (DR) (Mann et al., 2002). While AR and MR superimpose virtual objects on the real world to 
enhance reality by placing new objects among real objects or extending real objects with virtual objects (Mori et 
al.,2017), DR deliberately removes parts of a real-world scene or replaces them with computer generated 
information (Mann & Fung, 2001) and it can be considered a subtype of AR. Based on computer vision techniques, 
unwanted image elements are detected and replaced by other image elements, creating an overall plausible and 
consistent impression for the viewer. The idea in diminished reality is to virtually remove something from the 
view. There can be identified different techniques for diminished reality implementation. In one kind of approach 
the object to be removed needs to be detected and the corresponding image area needs to be filled in with a texture 
that seems to belong to the background. In image processing terms, this kind of filling-in operation is called image 
inpainting (Siltanen, 2017). Another approach discusses a way of representing occluding objects as semi- 
transparent. This visualization technique is called AR X-ray vision, see-through vision, or ghosted views. Semi- 
transparent representation is useful for seeing through car interiors and walls (Mori et al., 2017). In automotive 
settings, a number of see-through displays have been incorporated. In Samsung’s Safety Truck, for example, live 
video images were displayed on the back of the truck, effectively granting trailing drivers the ability to ‘see 
through’ the truck. 


The very first approaches to DR aimed to lower the saturation of some areas to force an observer to face other 
regions and virtual objects overwrote undesirable real objects to hide the real information (Mann, 1994). After that 
this technique was used for different purposes: to remove a person from Google Street View pictures to protect his 
or her privacy, to remove a person in a video, to remove a vehicle in front of the driver, to remove a baseball 
catcher to visualize the view of the pitcher from a view behind the catcher, and to generate a panoramic 
stroboscopic image (Mori et al, 2017). In the case of see through applications one of the methods to provide the 
current state of the hidden area in the main view is the same as that used by Zokai et al. (2003) that used two 
additional cameras as hidden background observers to erase from the main view pipes in a factory. Mori et al. 
(2017) constructed light fields with a real time multi-camera system and removed a viewer’s hand from the 
perspective to visualize the viewer’s workspace occluded by his or her own hand. Queguiner et al. (2018) presented 
a diminished reality application running live on consumer mobile devices. In their pre-observation-based approach, 
the clean 3D scene, free of undesired objects, is scanned beforehand and reconstructed as a high-resolution textured 
3D model. Many see-through DR literature tends to investigate more computationally efficient approaches 
focusing on compensating real—virtual boundary in screen space. Thereafter, semi-transparent or wireframe 
representation is performed to improve user depth perception. These representation methods will be useful for 
avoiding the danger of a collision with the diminished objects. In this regard, Peereboom et al., 2023 presented a 
system that exploits DR for avoiding collisions between pedestrians and cars caused by poor visibility, such as 
occlusion by a parked vehicle. In one of the solutions, they proposed the occluding vehicle has been made semi- 
transparent. Still recently, Cheng et al. (2022) proposed a study on users' perception of diminished reality and its 
possible applications. 


However, the concept of DR is still little exploited in construction although Klinker, Stricker and Reiners (2001) 
develop the first examples of diminished reality in the field of construction with two very significant examples: in 
the first one TV antennas located on a hill are removed from the view within which the work that will replace them 
is later placed so that it can be visualized in its chosen location. In the second case given as an example, the 
installations below a wall are shown in the form of holograms. 


This research aims at exploiting the concept of diminished reality for construction sites operations efficiency and 
safety. The innovation of the system proposed lays in the combination of skeleton recognition, which provide 
operators position and field of view, and the automatic procedure that detect obstacles and made them visualized 
as a wire frame model. The integration of different technologies for onsite application is also another key aspect. 


3. DR FOR CONSTRUCTION SITE SAFETY MANAGEMENT SUPPORT 


The system proposed in this research is a diminished reality application for supporting safety management at 
construction sites. The development focused on a specific case and that is load handling by crane. In the case of 
cranes maneuvered from the ground, but not only, there are situations in which visibility can be reduced: work at 


197 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


height in which the support is not clearly visible, maneuvers on the construction site in which the placement of 
material takes place beyond already built constructions. For this reason, a viewer that supports the operations 
coordinator in visualizing possible hazardous situations can assume great value (Fig. 1). To this end, we focused 
in this work on visualizing workers in locations hidden from view. 


It was decided to proceed with localization and visualization through depth camera with a twofold advantage: on 
the one hand, this instrument returns the skeleton that allows us to understand how the worker is placed and 
oriented; on the other hand, the fact that only the skeleton is communicated removes a number of privacy issues. 
In this system then it was planned to also instrument moving loads, with sensors (UWB inside and RTK outside) 
in order to be able to communicate this data to the operations coordinator as well. Finally, the developed application 
involves the transformation of the obstacle object visualization from solid to wireframe. Such a system finds its 
application not only in the highlighted case but also in many other site operations such as demolitions where 
knowing what lies beyond certain elements could be crucial. 


Fig. 1: Diminished reality for construction site. 


4. METHODOLOGY 


The methodology followed in this research for improving the safety of workers in construction sites through the 
exploitation of DR is shortly described in Figure 2. 


The first step is setting up a localization framework covering the entire construction site. Such framework should 
be a real-time system capable of attaching spatial coordinates to any entity, object or person, moving around in the 
construction site. The specific technology employed for this step can vary depending on several aspects, e.g. 
whether the construction site is outdoor or indoor, or whether we are considering an existing facility to be 
recommissioned or instead a new building or infrastructure to be built. For complex scenarios, multiple 
technologies may coexist at the same time to localize objects and people in different areas of the overall 
construction site. In such a case a problem arises of allowing a seamless integration of different localization 
systems to provide a homogenous notion of localization on the construction site. The core idea behind the 
localization framework is to support the detection of safety hazards so as to prevent incidents. 


Asecond step of the methodology is the setup of a heterogeneous network of sensors continuously monitoring the 
activities in the construction site. The sensor network may employ very different devices both for their 
technological characteristics and for the kind of data collected (e.g. temperature sensors, cameras, NFC tags, etc.). 
The basic requirement for each sensor in our methodology is that it must be able to attach a precise timestamp and 
a precise set of coordinates relative to the localization framework set up. This would witness when and where the 
information was sensed in the construction site. 
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A third step is the setup of a DT Platform integrating several sources of information operating on and off the 
construction site. For instance, interpolating information coming from the construction design together with data 
collected by the network of sensors the platform can provide the most updated “picture” of the building or 
infrastructure being realized. The DT Platform is also responsible for managing and harmonizing the several data 
models used to express the information coming from the different (internal and external) sources, e.g. by using 
linked data or ontology-based semantic data integration procedures. 


Monitor objects Integrate Obstacle 
and workers information recognition Rendering / 
using sensors using BIM using Spatial Visualization 

network platform Engine 


Setup 


ocalization 
framework 


Fig. 2: Methodology for applying Diminished Reality to construction site safety. 


Next, a Spatial Engine should be implemented that processes all the spatial information from the BIM models, the 
persons equipped with DR viewers, the sensors deployed through the site, the vehicles and all the various agents 
acting within the site. Given the position of a person using a DR viewer, its main role is to determine which agents 
and BIM objects are of interest for him/her and to provide the corresponding data streaming. Moreover, by 
exploiting the information stored in the CDE, the Spatial Engine is the place where possible spatial interferences 
can be detected, forecasted, and notified at run time to the involved agents in order to prevent injuries. 


Finally, a DR visualization tool is in charge of rendering holograms mixed with the real objects of the construction 
site in order to allow the observer to view also those hidden objects that can be involved by the ongoing operations. 
The obstruction is determined in real-time based on the observer’s point of view and the position of the moving 
objects given by the sensory data: when a BIM object is hit by the ray joining the observer with a moving object, 
this BIM object is classified as obstructing and it is rendered as wireframes thus making visible the virtual objects 
behind it that reproduce their invisible real twins. 


In the following sections this methodology is applied starting with the indoor localization framework. The sensor 
network has been set up with a depth camera and two location tags. The former is a special computer camera with 
two “eyes”. By merging the visual information coming from them, it can compute the distance (or depth) of objects 
and bodies w.r.t. its point of view, when they appear in the picture. The chosen depth camera also provided 
advanced artificial vision software capable of recognizing the position of core joints in the skeletons. This 
information can be exploited to know the real-time position and orientation of an individual or of a crew of 
workers. As a consequence, it is possible to relate this information the one coming from the tag attached to the 
load lifted by the crane in order to know the exact relative position and the risk of the workers to be hit. A mixed- 
reality head-mounted display (HMD) is used as visualization tool where the BIM model of the construction site 
and signals coming from the sensor network (e.g. the position of the workers in the construction site coming from 
the depth camera, as well as the position of the depth camera itself, and of the crane load) are downloaded from 
the DT platform together with the spatial simulation results. Also the head-mounted display DR application can 
download from the DT platform information about the 3D objects that hinders the view of workers and the load 
when the crane is operating thank to ray-tracing data transmission. The ray-tracing technique uses basic geometry 
in order to select all the 3D objects (e.g. walls, doors, columns, etc.) that lay in between the user viewpoint and the 
objects returned as obstructions (i.e. the workers and the crane load) and then the platform sent them back made 
transparent in a wire frame visualization. 


5. SYSTEM ARCHITECTURE 


The architecture of the system proposed in this paper is the one depicted in Figure 3. In accordance with what was 
previously described in the methodology paragraph in this section the specific devices and components of the 
system will be made explicit. For what concerns the localization of the operators, we chose to use a depth camera 
and specifically a ZED 2i. This device made possible two things: 1. recognize the skeleton of the workers; 2. place 
the skeleton in the virtual space once the position of the camera in the real space is known. The ZED 2i camera 
provides the streaming of the joints of the skeleton, which were set equal to 18 points. 
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Fig. 3: System architecture and data flow. 


For space modelling it can be used a BIM model provided in ifc format. This type of space modelling contributes 
in modelling the different building components enabling at a later stage to go and select one by one the objects 
that represent obstacles to workers visualization. In the development here proposed, Autodesk Revit was used for 
BIM modelling. Moreover, in the case of the construction site where the presence of the elements is dependent on 
the progress of the work, the combination of ifc files with a timeline also makes it possible to analyse what would 
be visible and what would not be visible at a precise moment in the construction. 


Finally, since not only the workers need to be located but there are other components of the worksite whose location 
is important to know, a number of sensor networks can be implemented. These vary as needed. In the case taken 
as an example in this article, two types of sensors were referred to: GPS RTK tags and UWB tags. Although the 
GPS RTK localization system could not be integrated into the indoor tests reported in section 6, its suitability for 
outdoor applications is shown by its quite high accuracy reported by technical sheets [Datasheet], which was 
confirmed by very preliminary trials performed by the authors. But this is going to be used in future research 
developments. For the purpose of the indoor tests subject of this paper, contextual infrastructure was monitored by 
means of UWB anchors. In any case, it is worth remarking that both types of sensors can be used to communicate 
position data. 


Moving on to the other components of the system, a platform, resulting from the development of the research 
project "A Distributed Digital Collaboration Framework for Small and Medium-Sized Engineering and 
Construction Enterprises" (PRIN 2017), is used. This platform uploads BIM models in ifc format, allowing 
browsing and querying. The platform then allows the integration of data from a variety of heterogeneous sources 
(in this case skeleton joints from ZED 2i and location data from sensors) allowing them to be located in space or 
linked to components (e.g., modeled building objects). In this way it is configured as a real DT platform. 


As for the spatial engine, this receives the necessary information and thus the positions of workers and moving 
objects (provided by UWB or GPS-RTK). The purpose of the spatial engine is to perform real-time processing 
about spatial interference and then transmit it to the platform, which will also relate it to the components of 
buildings in space so as to identify objects of obstruction to operator's view. 
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Finally, the last component of the system is an MR head-mounted display whose use has two purposes. The first 
is to enable the display of holograms for a diminished reality visualization. The second is to provide the position 
and orientation of the observer. The Microsoft Hololens used as the MR provider receives the information directly 
from the platform and thus the visualization of the skeleton joints, the model of the surrounding environment, the 
positions of moving objects, and objects identified as obstacles and whose representation is then modified by the 
procedure to change from solid to wireframe. In order to achieve a consistent visualization, the Hololens transmits 
to the platform the position of the operator with respect to the modelled space and information about the orientation 
of the head. In this way it is possible to precisely calibrate the sample scenario. 


6. DIMINISHED REALITY APPLICATION 

The experiment was conducted using a Python application sensing the skeleton information coming from the depth 
camera ZED 2i through the Python SDK. In this application, the camera acquired two video streams at 30 fps, and 
once every 3 frames the skeletons information where encoded as JSON objects and published using the MQTT 
protocol. For debug purposes, the position of workers' skeleton joints where highlighted in a live video stream 
(Fig. 6 b). This resulted in 10 MQTT messages sent per second. The DT Platform subscribed the topic on the 
MQTT broker, thus receiving the skeleton signal from the Python application. The DT platform was in charge of 
sending the skeleton signal as well as the BIM design and the location of the camera, the Head Mounted Display, 
and the object moved by the crane, to the Spatial Engine. The latter was responsible for detecting in real-time the 
list of obstacles in the BIM design and passed it to the Head Mounted Display, together with the position of the 
workers and the crane load w.r.t. the Head Mounted Display. 


The logic behind the Spatial Engine detects obstacles using ray-casting, a technique implemented by the Spatial 
Engine which returns a list of objects hit by a hypothetical ray departing from a source position towards a given 
direction, and having a length bounded by a given maximum value. To reach our aim, we need to use the user's 
viewpoint as source position of the ray-to-be-casted, next we need to cast a ray on each direction corresponding to 
each joint of the workers skeletons detected by the depth camera and each object moved by the crane; finally, we 
must set as maximum length of the casted rays the actual distance between the user and the joints or moved objects. 


7. DEVELOPMENT AND CASE STUDY 


Feasibility tests of the developed application were carried out in the DC3 laboratory of the DICEA department of 
the Polytechnic University of Marche. In this case since a crane was not available, the overhead crane carriage 
inside the laboratory was used to simulate the suspended load in motion. The hook of the bridge crane was 
instrumented with a UWB position sensor which sends data to the platform (Figure 4). The laboratory is equipped 
with UWB anchors placed at the corners of the room for precise position sensing. 
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Fig. 4: a) Bridge crane hook instrumented with UWB sensor; b) UWB sensor visualization in the space inside 
the platform. 


Next, the ZED 2i camera was placed as seen in Figures 5 and 6 a). The information that the chamber returns 
through the recognition of the skeleton joints is as in Figure 6 b). The orientation of the worker is also detected as 
the blue joints detect the left side of the body while the red joints detect the right side. In this way it is possible to 
understand which direction the worker is looking at and thus whether in the presence of a moving load he is 
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presumably aware of the presence of the load or whether his being from behind identifies a situation of greater 
danger as he may not have noticed that something is moving in his vicinity. 


[i 


| 


——— 


Fig. 5: Components position inside the laboratory. 


The test performed reproduced the same conditions shown in Figure 1. The laboratory is placed in a shed divided 
in half by a wall 3.5 m high, about half of the total height of the structure. Figure 5 shows the location of all 
components for the experiment in the laboratory. In order to verify the actual operation of the application for the 
developed diminished reality, the operator wearing the HoloLens stood on the left side in the floor layout of the 
shed. Looking toward the partition wall (Fig. 7 a)) the HoloLens showed the wall in wireframe visualization and 
the skeleton points of the operator who was moving around the load on the other side of the room. In this first 
processing the positioning of the load was not implemented in the hologram visualization although it was present 
as data in the platform. Again, it is possible to observe the differentiation of blue and red colours for the left and 
right side of the body respectively which allows the worker observing the scene to understand the orientation of 
the worker. 


Fig. 6: a) Load position and ZED 2i location; b) Depth camera vision and skeleton joints recognition. 
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Fig. 7: a) Vision from inside the HoloLens (other side of the wall) ; b) Vision from inside the HoloLens (same 
side of the worker framed); c) operator with the HoloLens and the bridge-crane remote control (other side of the 
wall). 


8. CONCLUSION 


This paper presents an application of diminished reality for safety management support during operations at a 
construction site. The proposed system involves the use of depth camera and sensors for locating components of 
interest and implements the development of a DT platform for real-time management. The system thus developed 
allows for optimization in the number of sensors that can be placed in strategic areas (depth camera) and alternately 
on moving loads or equipment to be monitored at the stages when they are expected to be utilized. Initial 
experiments of the application of this development were carried out in the laboratory by reproducing conditions 
similar to those at a real construction site. The next steps will concern the implementation of the visualization 
inside the HoloLens of the position of the moving load and tests in outdoor construction sites thus having the 
possibility to test also the visual rendering of holograms in the open air that could be an obstacle to the optimal 
visualization for operators. 
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ABSTRACT: Virtual reality simulations conducted by driving simulators represent a methodology to assess both 
the quality of road design and road safety in a safe, controlled, and replicable environment. 


Nowadays, there are numerous studies that use driving simulators to analyze the driver's response when specific 
road safety treatments are planned before these are implemented. This approach allows the road designer/scientist 
to estimate the potential safety effectiveness of the countermeasure/design configuration considered. 


However, although virtual reality simulations are potentially extremely useful in the evaluation of road 
configuration design and treatments effectiveness, they also have cons. The two most important are the limitations 
in the reproducibility of the realworld environment and the difference in drivers’ behavior due to the awareness 
that they are conducting a test. 


In this context, our research compared the data collected during virtual reality experiments with those collected 
in the field with an instrumented vehicle, after a few years from the implementation of the specific safety measure 
on a real road. Statistical analyses were conducted to compare the results of the two experiments to demonstrate 
the reliability of the virtual simulations and to identify the limitations. 


KEYWORDS: driving simulator, road safety, virtual reality, road safety treatments, road safety measures 
effectiveness, in-field test 


1. INTRODUCTION 


Making safe roads and decreasing the number of accidents, deaths and injuries on the roads is one of the greatest 
challenges of this century. 


Nowadays technology allows scientists, engineers, and technicians to approach road safety including also human 
behavior. Virtual reality represents one of the instruments that allow road engineers to understand and evaluate in 
a safe and controlled environment how both road configuration and other contingent factors, affect human behavior. 


When an unsafe road section is identified as a “black spot’, e.g., through an analytical accident analysis, the Road 
Administration (RA) works “to change” the road configuration, improving aspects related to road traffic safety. 
The solution selected is then implemented on the road and only a few years after the intervention, it can be observed 
whether the proposed countermeasure has been effective (or not). This type of reactive approach requires investing 
high budgets to correct the road deficiencies and waiting years to see if the solution proposed was useful in the 
accident mitigation phenomenon (not always proving its effectiveness). 


The use of virtual reality approach, instead, allows to investigate the described phenomenon in a proactive manner 
(before the accident occurrence), using a controlled, safe and ethical test environment (Calhoun and Pearlson, 
2012). Virtual reality also allows to perform a safe, controlled, reproduceable and standardized experiment (de 
Winter et al, 2012). In fact, the researchers define the “road environment” within a scenario able to describe the 
main phenomena to be studied, for example, different road geometry (Bassani et al., 2019b; Bobermin at al., 2021; 
Montella et al., 2018), different intersection layout (Danaf et al., 2018; Kekez et al., 2022) or different cross section 
organization (Bella, 2013; Ben-Bassat and Shinar, 2011; Mecheri et al., 2017; Domenichini et al., 2018); or 
introducing events describing specific road users’ interactions or limited sight distance (e.g., driver-pedestrian with 
occlusion, etc.) (Bassani et al., 2019a; Domenichini et al., 2018) 


Unfortunately, the use of driving simulators also has cons to be considered. Limited physical, perceptual, and 
behavioral fidelity of the instrument (driver simulator) affect both the experimentation reliability and the driver’s 
interest in the test, especially if the vehicle cannot reproduce the vehicle performances in an accurate way (de 
Winter et al, 2012; Boda et al., 2018; Pawar et al, 2022). The possibility to transfer the study results to “an actual 
road safety countermeasure construction” needs the results to be evaluated in terms both of fidelity and validity. 
The first validation processes were defined since the last century (Blaauw, 1982; Klee et al., 1999). The term 
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“fidelity” has been used to describe the ability of the driving simulator to reproduce the sensory stimuli present in 
a real driving environment. This ability was strongly dependent on the quality of the equipment (e.g., motion 
system, projector, screen and display, simulacrum and sound) (Kaptein et al., 1996). Wynne et al. (2019) conducted 
an extensive literature review concerning driver simulator validation studies. According to the definition given in 
Blaauw, the word “validity” describes the ability of the study to accurately represent the drivers’ behavior in a real 
world. The study considered two types of validity: absolute validity and relative validity (Blaauw, 1982). 


The most used parameter to check the quality of the driving simulator results was drivers’ speed (Cao et al., 2015; 
Bella 2008; Bham et al., 2014; Yan et al., 2008; Branzi et al., 2017). Often the speed was coupled with other 
parameters describing the drivers’ performance such as acceleration/deceleration or lateral position (Blana and 
Golias, 2002; Chen et al., 2021; Kazemzadehazad et al., 2021) and drivers’ reaction time (McGehee et al., 2000; 
Engen, 2008). 


The literature shows that in the analysis of road safety the use of the driving simulator represents a great 
opportunity for road engineers and scientists to assess preliminary the impact of a specific engineering treatment. 
The virtual reality evaluation allows also to define the best safety solution with reference to the specific road safety 
objective and without any implementation cost. The objective of this research is to complete the research conducted 
in 2017 (Branzi et al., 2017) also investigating the ability of the LaSIS driving simulator to reproduce the drivers’ 
behavior when different safety countermeasures were present along the road. This validation study compared the 
speed profiles of the driving simulations and the real world drivers, evaluating where the results are similar and 
which effects could instead cause differences in drivers' behavior and in virtual reality results exploitation. 


2. METHODOLOGY 
2.1 Research history and overview 


Via Pistoiese was studied by our research group in recent years, especially concerning pedestrian safety. Statistics 
on accidents in Florence always placed this street in the first places in terms of danger, especially for vulnerable 
road users (VRUs) (Domenichini et al., 2014). 


The safety problems of the street were different, and they included the high speed, the high level of interaction 
between traffic and VRUs due to the strong traffic demand, the geometrical configuration (a long straight about 4 
km), and the high presence of commercial activities, residential areas, and parking stalls along the road. 


To improve the road safety of the area, part of the street was interested by a reconfiguration project, where 
numerous traffic calming measures were defined and implemented with the aim of limiting the speeding 
phenomena, including: 


e introduction of both raised pedestrian crossings and/or raised intersections to control the speed along the 
section; 

e installation of a raised median curb to avoid overtaking maneuvers, but which can be performed only by 
the emergency vehicles; 

e reduction of the lane width to the standard value for this type of street; 

e introduction of high perception elements to improve the driver perception of the context. 


A few years before the road modification, the entire reconfiguration project was studied in virtual reality. In 
Domenichini et al. (2018) and in Branzi et al., (2018) the good results obtained from the experimentation were 
extensively described. 


In 2018 the safety solutions evaluated in virtual reality were implemented along via Pistoiese and nowadays are 
part of the road environment. In this context a new experimentation was conducted by the authors to monitor the 
effectiveness of the engineering treatments over time, and to understand if the results obtained by the LaSIS driving 
simulator are reliable. 


This latter experiment was conducted in-field with a specific device named V-BOX HD2, which is similar to a 
black box capable of recording the kinematic parameters of the moving vehicle. The experiment can be considered 
as a validation experiment for the result obtained in the virtual reality evaluation comparing speed and 
acceleration/deceleration behavior. Figure 1 represents the research approach and the connection between the two 
different tests conducted, in virtual reality and in-field. 
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Fig. 1: Overview of the research 


2.2 Detailed description of the road context 


Via Pistoiese is a street located in the suburban area of Firenze, classified as an urban collector road (it serves 
penetration movements but also VRUs movements for commercial and residential activities). The road geometry 
is simple, and it is composed by one curve (R=250 m) that connects two straights which are connected to the road 
network with roundabout intersections. The road segment is about 4.5 km long (Figure 2). 


Fig. 2: via Pistoiese (Firenze) 


The cross section is about 18 m wide. On the roadside, a 2.00 m wide parking area and two sidewalks are present 
(1.50 m wide) on both sides. A wide cross section is dedicated for motorized traffic, and it is organized in different 
configurations along via Pistoiese as described below: 


e different number of lanes (segments with one lane per direction and segments with one lane in one 
direction and two lanes in the other one); 

e presence/absence of a median curb that does not allow left turn maneuvers; 

e numerous and different traffic calming interventions (such as raised pedestrian crossing, raised platform 
in the intersection, chicanes, etc.). 


In this paper, the analysis was conducted with reference to the road segment interested by the traffic calming 
treatments (in orange in Figure 2). In Figure 3 and in Figure 4 some comparisons between the actual street and the 
virtual scenario are shown. 


Fig. 3: via Pistoiese: real world VS virtual reality (1/2) 
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Fig. 4: via Pistoiese: real world VS virtual reality (2/2) 


2.3 Apparatus 
Two different apparatus are used in this research, the LaSIS Driving Simulator and the V-BOX HD2. 


The LaSIS Driving Simulator is a motion-based simulator, equipped with a full-scale Lancia Y simulacrum fixed 
on a 6 degree of freedom Stewart’s platform. The platform allows roll, yaw and pitch movement of the vehicle. 
The vehicle interior includes all commands normally available within a car, with a steering wheel with force 
feedback. The three rear-view mirrors, the central one and the side mirrors, are equipped with displays that project 
the scenario just traveled and complete the vehicle interior. The cabin is surrounded by a cylindrical screen about 
200° wide where 4 projectors reproduce the driving environment. The sounds in the environment and in the 
participant’s car are generated by a multichannel audio system. The data acquisition frequency of the apparatus is 
20 Hz. According to the classification proposed by the literature in term of the ability of the device to emulate 
driving in real world (i.e., vehicle controls, field of view and kinesthetic), the LaSIS driving simulator can be 
classified as a high-fidelity driving simulator (Goode at al., 2013; Wynne et al., 2019).. 


The VBOX HD2 system used for the on-field test consists of a mobile device, like an advanced black box, able to 
record dynamic information concerning the vehicle movement (such as speed, GPS position, acceleration, 
deceleration, position in the lane, etc.). The instrument needs to be fixed inside each passenger car used in the on- 
field experiment. The acquisition frequency is different for GPS and video information and respectively equal to 
10 Hz and 60 Hz. The VBOX application allows to read the measurements synchronized and check in a remote 
analysis the information related to the recorded data (e.g., available satellites, traffic conditions, etc.). 


2.4 Participants and procedure 
Participants were recruited on a voluntary basis among students, staff, expert drivers and common people. 
In both tests, drivers had to meet the following requirements: 


e possession of an Italian valid driver’s license; 
e normal or corrected-to-normal vision. 


Two samples were recruited composed of 48 users and 36 users respectively for virtual reality and on-site test. 
Samples do not contain people who drove in both tests due to the different time frame in which they were conducted 
(2015-2016 and 2021-2022), but mostly because the selection of the same participants can affect the drivers’ 
behavior in the second experiment due to the previous experience in virtual reality. Table 1 summarizes the 
participants’ characteristics. 


Table 1: Participant characteristics 


Virtual reality experimentation In-field experimentation 
M F M F 
Gender 36 12 28 8 
Age 42.2 (S.D. 12.7) 40.6 (S.D. 17.12) 


NOTE: S.D. standard deviation 
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Their driving experience (years of driving license possession) ranged between 3 years and 46 years. Except for 5 
participants, all of them declared that they travelled at least 5,000 km in a year. 


Each participant was tested individually, according to the two different procedures adopted respectively for virtual 
and full-scale tests (Domenichini et al, 2018; Meocci et al., 2023). Table 2 summarizes the main steps of the two- 
procedures adopted for the experimentations. 


Table 2: Procedures summary 


Virtual reality test In-field test 


No payment for the involvements 


Participants were not informed about the research objective 


Test duration about 35-40 min in safe and controlled environment Test duration about 35-40 min in real-word but in a defined path. 
(LaSIS laboratory) No restriction of the traffic conditions was defined during the test. 
The drivers’ performances were recorded by the LaSIS driving The drivers’ performances were recorded by the V-BOX HD2 
simulator fixed each time inside the drivers’ own car 


2.5 Data collection and analysis 


To test the validity of the results obtained by the LaSIS driving simulator the comparison was made analysing the 
speed profiles obtained in virtual reality experimentation and in full-scale test. 


A preliminary comparative analysis of the entire average speed profile was conducted to analyze if the simulator 
results showed the same patterns (and macroscopic effects) as those measured in the real world (relative validity). 
The comparison was made only with reference to the profile sections where there were no conditions that 
influenced the drivers’ speed (i.e., pedestrians who cross the street in the simulation or traffic congestion in the in- 
field experimentation). A qualitative comparison was also carried out with reference to the V85 speed. 


The absolute validity of the simulation results was evaluated by means ofa statistical test. The two datasets consist 
of the speed measurement along via Pistoiese in virtual reality and in-field. The two datasets were preliminary 
verified by the Shapiro-Wilk and Levene’s tests respectively for normality and homoscedasticity assumptions. In 
the former test Ho states that the variable is normally distributed, in the latter Ho states that the variables we 
compared had equal variance. Both the tests were conducted with a significant level of 5%. 


Subsequently two tests to compare the averages of two groups and determine if the differences between them are 
more likely to arise from random chance were conducted, the t-Student’s test for independent sample when the 
sample was normally distributed and the U Mann-Whitney’s test, a non-parametric test, for the other samples. 
Both tests were conducted with reference to the null hypothesis HO: the difference in mean is equal to 0. In all 
cases where the null hypothesis was rejected, also the effect size was determined by the d-Cohen metric. This 
allows to define the strength of the relationship between two variables compared. 


Finally, according to Losa et al., (2013) a regression analysis was conducted to investigate how the driving 
simulator experimentation reproduces the real-world performances, considering each road segment. 


3. RESULTS AND DISCUSSION 
3.1 Relative validity: average speed profile comparison 


Figure 5 shows the result of the preliminary comparison between the two speed profiles. Specifically, in the chart 
were depicted the speed profile recorded in virtual reality (in red), the average speed profile recorded the in-field 
test (in green), the absolute difference between the two speed profiles (in light blue), and the number of lanes in 
the considered direction (in black). Furthermore, the red dashed line indicates the position of the pedestrian 
crossing axes where an event was reproduced in the virtual reality simulation (pedestrian who is crossing the street). 
Finally, green dashed lines indicate the position of the stop lines within the intersection and green dotted lines 
indicate the position of the pedestrian crossing axes. 
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Fig. 5: speed profile comparison (above: average, below: V85) 


The trend of the two profiles is similar. However, there are some areas where differences between the two profiles 
are noticeable. Regarding the simulator tests, the pedestrian who is crossing the street should be mentioned. This 
event obviously affected drivers’ speed and caused significant deceleration, thus influencing the results obtained 
in virtual reality experimentation. This explains the negative peaks of the average profile in virtual reality (dashed 
red lines). On the other hand, as far as the field tests are concerned, it must be remembered that the traffic conditions 
were not restricted or imposed such as in virtual reality where all traffic lights were green. Therefore, drivers' 
speeds were sometimes reduced due to stationary traffic or otherwise significantly delayed by red lights. In this 
sense, the indication of the position of both the intersections and the pedestrian crossings are needed to better 
explain where this type of event could potentially occur. At the same time, information on the number of lanes can 
help to understand where traffic queues are most likely to occur. Figure 6 shows that in the areas where there were 
high levels of traffic or traffic queues a negative peak in the green curves was present. To overcome these issues, 
it seems more appropriate to compare the mean speed profiles only in similar traffic conditions (i.e., where the 
contingent conditions are the same for virtual reality and in-field tests), excluding therefore all the road segments 
where speed profiles are strongly affected by external conditions (i.e., pedestrian crossing the street in virtual 
reality tests (in red) and delays due to high traffic level in real world (in yellow)). Therefore, as shown in Figure 
6, eleven (11) different segments were identified. 
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Fig. 6: identification of street segment subjected to different condition between virtual reality and real world 


The 1%, 3°, 5, 8, and 10‘ segments represent the road sections where the “contingent conditions” were the same 
for virtual reality and in-field tests. Thus, statistical checks of absolute validity were carried out only in these 
sections. The 24, 6", 9", and 11" segments represent instead the road sections where the mean speed profile for 
the in-field test is strongly affected by traffic conditions. These areas were highlighted in yellow. Finally, the 4", 
and 7" zones describe the road sections where the mean speed profile obtained in virtual reality is strongly affected 
by the event where a pedestrian was crossing the street. As indicated in Domenichini et. al., (2018), the pedestrian 
starts crossing when the vehicle was at a stopping distance from the pedestrian crossing axes (equal about 55 
meters). Therefore, the influence of the event begins 55 meters before the event takes place. The end of the 
influenced area was assumed equal to the one assumed in the previously mentioned study, e.g., in the point at 
which drivers, after braking because of the pedestrian crossing, recognize that they have regained an adequate 
driving speed by significantly easing the pressure on the accelerator pedal to a minimum and constant value. These 
areas are highlighted with a red box. 


The preliminary analysis shows a relative validity of the virtual reality analysis, but obviously only where the 
conditions in virtual reality and in real world were the same (e.g., not strongly affected by the traffic or other 
events). 


3.2 Absolute validity 


The absolute validity was evaluated only in the road segments where the same conditions were present (1, 3, 5, 8 
and 10). Therefore, segments highlighted in yellow and in red were not considered in the statistical analysis (see 
Figure 6). Statistical analyses were carried out considering the speed values recorded in the virtual reality and on- 
fields tests for a given point in the travelled path. The analysis was repeated in 50 points equally spaced out 
(approximately every 30 m). In Table 3 the results obtained are summarized. 


Table 3: statistical test results 


; : Hs 
Distance Shapiro-Wilk’s test U Mann 
Levene’s iya t-Student d-Coh Wiii Reig 
(m) Virtual On-field ia -value “valu -Cohen ithney esults 
reality p-value 
112.0 0.84 0.836 <0.001 -3.941 <0.001 0.751 - HO rejected 
141.5 0.019 0.946 0.191 -3.115 0.001 0.751 <0.001 HO rejected 
170.5 0.03 0.635 0.072 -1.47 0.146 - 0.079 HO 
accepted 
200.0 0.019 0.738 0.112 0.591 0.556 - 0.813 HO 
accepted 
229.0 0.01 0.815 0.886 4.957 <0.001 1.195 <0.001 HO rejected 
329.0 0.049 0.303 0.003 2.076 0.041 0.431 0.178 HO 
accepted 
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358.5 0.126 0.006 0.004 2.032 0.046 0.398 0.185 

388.5 0.523 0.101 0.003 0.589 0.558 - - 

418.0 0.043 0.148 0.123 0.887 0.378 - 0.546 

447.5 0.130 0.138 0.211 1.624 0.109 - - 

477.0 0.305 0.481 0.098 1.981 0.051 - - 

507.0 0.014 0.859 0.025 2.78 0.007 0.579 0.024 HO rejected 
536.5 0.006 0.198 0.042 3.154 0.002 0.656 0.01 HO rejected 
566.0 0.021 0.158 0.014 2.435 0.017 0.494 0.095 

596.0 0.005 0.446 0.007 2.237 0.028 0.439 0.178 

625.5 0.015 0.081 0.004 2.018 0.047 0.394 0.224 

655.0 0.008 0.234 0.006 1.743 0.085 - 0.327 

684.5 0.331 0.586 0.008 1.462 0.148 - - 

714.5 0.102 0.716 0.044 1.638 0.106 - - 

744.0 0.027 0.402 0.214 1.494 0.139 - 0.301 

773.5 0.003 0.244 0.175 1.943 0.056 - 0.140 

803.0 <0.001 0.578 0.048 2.433 0.017 0.500 0.064 

833.0 0.008 0.015 0.006 2.339 0.022 0.456 0.049 HO rejected 
862.5 0.109 0.245 0.002 1.438 0.155 - - 

892.0 0.053 0.434 0.005 1.400 0.166 - - 

921.5 0.064 0.273 0.006 1.109 0.271 - - 

951.5 0.073 0.780 <0.001 -0.419 0.676 - - 

981.0 0.716 0.771 <0.001 -3.833 <0.001 0.73 - HO rejected 
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1077.0 0.631 0.037 0.06 -8.423 <0.001 2.005 <0.001 HO rejected 


1108.5 0.216 0.02 0.163 -3.442 <0.001 0.819 0.039 HO rejected 
1140.0 0.168 0.004 0.048 -2.075 0.041 0.433 0.029 HO rejected 
1171.0 0.393 0.011 0.009 -2.075 0.041 0.422 0.042 HO rejected 
202.5 0.111 0.258 0.015 -1.702 0.093 - - HO 
accepted 
234.0 0.805 0.969 0.011 -2.563 0.012 0.519 - HO rejected 
265.5 0.488 0.299 <0.001 -4.871 <0.001 0.970 - HO rejected 
297.0 0.021 0.878 0.001 -4.966 <0.001 1.017 <0.001 HO rejected 
328.5 0.0364 0.133 0.031 -2.99 0.004 0.619 0.007 HO rejected 
359.5 0.551 0.132 0.12 -0.528 0.599 - HO 
accepted 
1391.0 0.363 0.005 0.433 1.925 0.058 0.084 HO 
accepted 
1422.5 0.722 0.054 0.371 2.87 0.005 0.683 - HO rejected 
1454.0 0.299 <0.001 0.444 3.369 0.001 0.802 <0.001 HO rejected 
1485.5 0.534 0.027 0.280 4.250 <0.001 1.011 <0.001 HO rejected 
1799.0 0.058 0.007 0.741 -6.624 <0.001 1.577 <0.001 HO rejected 
1831.5 0.181 0.007 0.004 -4.413 <0.001 0.860 <0.001 HO rejected 
1863.5 0.232 0.029 <0.001 -3.064 0.003 0.607 0.012 HO rejected 
1896.0 0.118 0.865 0.319 0.319 0.751 - - HO 
accepted 
1996.0 0.060 0.031 0.207 2.785 0.007 0.663 0.021 HO rejected 
2022.5 0.004 0.244 <0.001 1.022 0.310 - 0.948 HO 
accepted 
2048.5 0.021 0.004 <0.001 2.058 0.043 0.401 0.140 HO 
accepted 
2075.0 0.036 0.410 0.043 4.538 <0.001 0.944 <0.001 HO rejected 


Note: boldface indicates statistically significant values with 5% level of significance. 


The results in terms of p-value were also depicted in Figure 7. The check describes the result each 0.5 m (for the 
considered road segment). In blue the segments where the HO was accepted, that indicate the absolute validity. In 
grey the opposite result. Where the curve trends were similar, a relative validity can be found, but the absolute 
validity sometimes is not obtained. However, only the segments close to those strongly affected by traffic (in real 
world) or events where the pedestrian crosses the street (in virtual reality) present different curve trends and 
therefore, different drivers’ behavior (HO rejected). 
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Fig. 7: results of the statistical analysis (p-value estimated each 0.5 m) 


Absolute validity was also found in the road segment where a traffic calming measure is present, e.g., around a 
distance equal to 350 m, 700 m and 900 m where raised intersection, chicane and lane narrowing and raised 
pedestrian crossing are respectively present. Therefore, the virtual reality study allows to obtain a good description 
of the drivers’ behavior, also in presence of safety countermeasures, partially confirming the finding described by 
Branzi et al., 2017 with reference to the street before the reconfiguration intervention (i.e., without traffic calming 
measures). The absolute validity was demonstrated in more than 50% of the entire road segment analyzed. 


3.3 Regression analysis for validity 


Finally, the regression analysis has been carried out. Figure 11 shows the regression result of the overall street. 
Table 4 shows instead the R? values obtained analyzing segment by segment, as in the previous paragraph. A good 
correlation among the speed values recorded during the two experimentations was highlighted in segment n.3. 
Lower R? values were instead determined in the other areas. 


60 - 


w 
a 


th 
> 


On-field speed [km/h] 
D 
w 


25 30 35 40 45 50 55 60 65 
Simulator speed [km/h] 


* V-85 —V-mean ~-~ Linear(V-85) = = Linear (V-mean) 


Fig. 11: regression analysis for validity (average speed and V85 speed) — overall path 
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Table 4: R? results — regression procedure for validity 


R? 
Segment ID 
Average speed V85 
1 0.0004 0.0080 
3 0.8881 0.9124 
5 0.0306 0.3976 
8 0.1358 0.5034 
10 0.6451 0.3616 


The regression analysis showed very good correlation in the segments n. 3. Low values of R? were instead found 
in the other segments. The overall result showed R? close to 0.5 both for the average and V85 speed. The result 
obtained confirmed those obtained in the statistical analysis. It demonstrates also that close to the road segments 
interested by different “contingent conditions” the validity of the simulation can be only relative or null. 


4. CONCLUSION 


The analysis conducted allows to observe that drivers’ behavior in virtual reality generally differs from the driver 
behavior in real world due, firstly, to traffic conditions and, secondly, to the different “stimuli’s’ perception” due 
to the fidelity of the driving simulator environment. The research conducted has demonstrated that events such as 
pedestrian crossing the street in driving simulator experimentation or real traffic conditions strongly affected the 
drivers’ behavior. Therefore, to compare virtual reality and on-site experimentation, the same “events” and “traffic 
conditions” are needed. 


The analysis conducted in virtual reality was evaluated both in terms of relative and absolute validity through a 
statistical test conducted on the entire road stretch observed and interested by different traffic calming measures. 
In the end also a regression analysis was made to confirm the result obtained. The two average speed profiles 
(obtained by virtual reality and on-field tests) presented a similar trend, maximum and minimum speed were 
reached in the same section if the “contingent conditions” of the experimentation were the same. In this sense, the 
qualitative analysis of the speed profile allows to define the relative validity of the simulation. Moreover, absolute 
validity was demonstrated in more than 50% of the road section analyzed. Therefore, the analysis conducted allows 
to demonstrate that the driving simulator study can be relevant to analyze the effectiveness of safety treatments 
before their implementation on real road. Moreover, this type of analysis allows the road engineering and Road 
Authorities to select the best engineering treatment as a function of the objective of the intervention. 


It can be concluded that the LaSIS driving simulator can be considered as a valid research tool for studying the 
factors affecting the drivers’ behavior and the effectiveness of the different traffic calming measures, confirming 
also the results obtained in previous research, when the same street was analyzed before the implementation of the 
safety intervention. The research highlights also the need to check in the “conditions evaluated” that can be quite 
different in virtual reality and on-field and affect the real drivers’ behavior. 
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ABSTRACT: An effective design review is critical to identifying changes and/or errors at the early stage of 
construction projects and reduce the project costs. Traditionally, design reviews are conducted by reviewing the 
project by reading multiple drawings. The inherent demands of reading project drawings are especially 
challenging for entry-level built environment learners who often need professional experience and may need more 
training and skills to fully understand technical representations. Previous research has focused on evaluating the 
impacts of interactive visualization technologies, such as virtual reality, on the learners’ design review thinking 
skills and showed how such technologies could support learners and industry professionals in performing design 
reviews. However, such research has yet to assess its impacts on their self-efficacy in engaging in design review 
thinking skills. Self-efficacy can be defined as one's perception of their ability to perform a task, such as problem- 
solving and evaluation. To understand how the VR technology can support learners in increasing their self- 
efficacy in performing design reviews, the researchers hosted a pilot study to evaluate immersive virtual reality 
design reviews' impacts. Based on the results of this pilot study, the implementation of immersive virtual reality 
has the potential to positively impact first year-built environment learners’ self-efficacy in performing design 
reviews. 


KEYWORDS: Virtual Reality, Self-Efficacy, Motivation, Education, Built Environment, Design Review 
1. INTRODUCTION 


Experimental learning through hand-on experience is key for the student to gain sufficient knowledge and skills 
about construction and built environment subjects. However, traditional teaching, which takes place in confined 
classrooms with occasional aid of online or web-based learning material (Fadol et al. 2018), cannot provide 
students with such experience. Inaccessibility to construction sites is a main challenge for experimental learning 
in the construction and built environment field (Ogunseiju et al. 2021). Over the past years, educators strived to 
use educational technologies to enhance the learning experience of the students. Virtual reality is one of these 
technologies that has drawn the attention of educators in different fields such as medicine (Duarte et al. 2020), 
Chemistry (Kader et al. 2020), and art (Serafin et al. 2016). Compared with traditional teaching, training using 
VR can stimulate students! interest in learning, and promote students' active learning while saving teaching costs 
and avoiding safety risks (Ding and Li 2022). VR-based learning can improve self-efficacy and motivation of the 
learners as it allows students to interact with a virtual environment resembling the actual environment and make 
experiments in a risk-free environment. Past studies showed the effectiveness of VR technologies for teaching 
different subjects in the Architecture, Engineering and Construction (AEC) industry such as infrastructure 
management (Arif 2021), earthquake-resistant construction (Kuncoro et al. 2023), offsite construction (Goulding 
et al. 2012) and safety (Le et al. 2015) (Le et al. 2014). 


Design review is one of the critical tasks in the AEC industry because identification of design changes and/or 
errors at the early stage of construction projects can significantly reduce the project costs. This task requires the 
participation of various stakeholders and interpretation of multiple engineering drawings, which is often 
challenging for entry-level built environment learners who do not have professional experience. Therefore, they 
need training to fully understand technical representations in the drawings and gain pertinent skills in design 
review. VR-based learning has been used effectively for educating students in design review and enhance their 
thinking skills (Kandi et al. 2020). However, no research has assessed the impact of this approach on the self- 
efficacy of learners and engaging them in design review. To bridge this gap, this research aims to understand and 
evaluate how VR can support learners in increasing their self-efficacy in performing design reviews. To this end, 
a pilot study was conducted with students at the undergraduate level for teaching design review practices. The 
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results of the study were then analysed, and the limitations of the research along with directions for future research 
are presented. 


2. LITERATURE REVIEW 
2.1. Immersive Virtual Reality in the Built Environment 


Virtual Reality (VR) has several potential applications in many fields, including medicine, engineering, education, 
and entertainment (Hamad and Jia, 2022). VR could be defined as a technology that simulates an environment, 
which can be interacted with in a manner that appears real or tangible (Sanni Hafiz Oluwasola and Ayinde Munir 
2015). The origins of VR technology can be traced back to the mid-1970s, with early experimenters using phrases 
like "artificial reality" to describe it (Machover and Tice 1994). The term "virtual reality" was coined by Jaron 
Lanier, founder of VPL Research, and the technology has since evolved from user interface design, flight and 
visual simulation, and telepresence technologies (Machover and Tice 1994). Freina and Ott (2015) identifies that 
there are two types of VR: immersive and non-immersive VR. Non-immersive VR is a computer-based 
environment that simulates places in the real or imagined world, while immersive VR gives the perception of 
being physically present in the non-physical world. Both types of VR are becoming more user-friendly and 
economically accessible (Kandi et al. 2020). Kaplan-Rakowski and Gruber (2019) further divided immersive VR 
(IVR) into low-immersion virtual environments (LiVR) and high-immersion three-dimensional spaces (HiVR). 
LiVR is a computer-generated three-dimensional virtual space experienced through standard audio-visual 
equipment, such as a desktop computer with a two-dimensional monitor. The ubiquitous online virtual world 
Second Life is an example of a LiVR environment. HiVR is defined as a computer-generated 360-degree virtual 
space that appears spatially realistic due to the high immersion afforded by a head-mounted device (Kaplan- 
Rakowski and Gruber 2019). 


The technology has also undergone significant advancements in recent years, with new displays, input devices, 
and technologies being developed and introduced to the market (Anthes et al. 2016). VR has the capability to 
present spatial information in a more engaging manner, allowing for interaction with designed spaces at a human 
scale (Pacheco et al. 2014). Large screen size and wide field of view are key features of immersive VR systems, 
while texture, lights, shadows, and objects contribute to the overall VR experience. These VR attributes can further 
augment the richness of information and enhance the visualization process VR is being increasingly used in 
architecture, engineering, and construction to support experiential learning, movement through space and time, 
and interaction with the design (Sala 2013). VR enables a more qualitative representation of spaces from the users' 
perspective, creating the illusion of depth and immersion (Castronovo et al. 2013). 


VR technology has the potential to impact the way users conventionally think and design the built environment 
promoting it beyond space and time constraints (Paranandi and Sarawgi 2002). The applications of the VR 
technology in Architecture, Engineering and Construction (AEC) industry are extensive, particularly in simulating 
environments and creating the feeling of immersion in a virtual world, which can assist architects and engineers 
in evaluating designs, understanding the needs of different users, especially those who are older or disabled. It can 
also promote inclusive design by providing in-depth insight into how particular groups of people experience the 
designed environment and how they interact with it. Nikolić and Whyte (2021) argue that VR can be used as a 
platform for an interdisciplinary integration of the allied design, social, and environmental disciplines. VR 
technology can provide easy-to-use communication solutions for all stakeholders in the AEC industry by 
providing a computer simulated environment with visual, auditory and haptic channels (Kähkönen 2003). VR 
technologies and computers are being utilized to facilitate the planning and construction of the built environment 
by aiding in the visualization and simulation of proposed designs, evaluating the visual impact of urban designs, 
and exploring broader economic ramifications (Whyte 2003). Tytarenko et al. (2023) reconstructed the Kilburun 
Fortress by monitoring the object's territory, analysing archival, librarian, and cartographic sources, and using 
various software tools such as AutoCAD, SketchUp, Quixel, and Twinmotion for modelling, rendering, and 
visualization. The resulting 3D model can be integrated into ArchiCAD and Revit software and showcases the 
applications of VR in the built environment. Although virtual reality environments (VRE) have enormous 
potential to engage students in classrooms and aid in construction workers' retention of safety knowledge, the 
adoption of VRE in the AEC industry remains minimal, as safety professionals still prefer hands-on training (Bhoir 
and Esmaeili 2015). 


VR adoption faces several challenges in architecture and design, including a lack of integrated 3D databases and 
accurate reality models, technical limitations such as precise monitoring and remote sensors, and challenges in 
education such as the uniqueness effect, cybersickness, and accessibility (Fakahani et al. 2022). VR is also limited 
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in its ability to create a real-life experience, particularly in conveying appropriate social behaviours and creating 
convincing virtual characters (Fakahani et al. 2022). Overcoming these challenges requires interdisciplinary 
efforts and finding the appropriate level of intervention (Fakahani et al. 2022; Hajirasouli et al. 2023; Lach et al. 
2020; Zhang et al. 2020). Zhang et al. (2020) proposed future research directions on using the VR technology for 
the built environment, including user-centred adaptive design, attention-driven virtual reality information system, 
construction training system incorporating human factors, occupant-centred facility management, and industry 
adoption. 


2.2. Immersive Virtual Reality in the Built Environment Education 


Current practices in the built environment education often fall short of adequately preparing students for the 
complexities and challenges of real-world professional settings (Afacan 2023; Gardner 2022). Conventional 
teaching methods, which rely largely on textbooks and two-dimensional representations such as drawings and 
photographs, may fail to convey the multidimensional and dynamic nature of the built environment (Stewart and 
Baker, 2019). Students may therefore struggle to comprehend spatial relationships, scale, and materiality, limiting 
their comprehension of the built environment (Stewart and Baker, 2019). In addition, limited access to job sites 
and real-world initiatives hinders students! ability to gain hands-on experience and understand the practical 
implications of their decisions (Gibbons et al., 2021). 


These limitations underscore the pressing need for innovative approaches that bridge the gap between theory and 
practice, equipping students with the necessary competencies to thrive in the complex-built environment. 
Consequently, situated learning is crucial to this field of study, as learning about the built environment requires 
exposure to a vast array of historical structures and design conditions (Afacan 2023) Situated learning in the built 
environment refers to an educational approach that emphasises learning within authentic and pertinent real-world 
contexts. It entails actively engaging students in tasks and challenges that replicate the complexities and 
requirements of their future professional roles (Bakhteyari et al., 2018; Aggerholm and Misfeldt, 2016). As a 
means of providing students with hands-on experience, conventional instructional methods utilise various case 
studies. Steele et al. (2023) assert that case studies are crucial for the education of those who are involved with 
the built environment. Thus, analysing completed projects allows students to examine the design process, 
construction techniques, and challenges faced by professionals, facilitating the development of problem-solving 
abilities and exposure to different design strategies (Hjaltadottir et al., 2018). However, when using case studies, 
the traditional teaching style encounters difficulties. To begin with, there are difficulties in comprehending projects 
through drawings and images, which limits students' grasp of spatial relationships, scale, and materiality (Stewart 
and Baker, 2019). Furthermore, limited access to job sites prevents students from firsthand experiencing the built 
environment, limiting their comprehension of project context and restrictions (Gibbons et al., 2021). Furthermore, 
student visits to construction sites are hampered by safety concerns, limiting exposure to construction processes 
and site-specific difficulties (Liu et al. 2019). 


To address these challenges, the emergence of virtual reality (VR) technology holds promise in enhancing situated 
learning and overcoming the limitations of traditional methods (Elghaish et al. 2021). The advent of VR tools has 
enabled individuals to begin experimenting with and employing VR (Liu et al. 2019). According to Elghaish et 
al. (2021), VR is a revolutionary technology that has the potential to improve construction design processes as 
well as promote education in the built environment. VR provides an immersive platform that enables students to 
virtually explore case study projects in three dimensions. By experiencing projects from various perspectives, 
students gain a deeper understanding of spatial layout, scale, and design intent (Ho et al., 2018). VR simulations 
enhance visualization by offering realistic representations of projects, allowing students to interact with virtual 
elements and observe the dynamic behaviour of structures or building systems (Lee et al., 2020). Furthermore, 
VR in the built environment education can help researchers and students simulate real-life, potentially hazardous 
situations without exposing them to actual danger, making VR experimentations a credible approach for resolving 
construction clashes and defects (Afzal and Shafiq, 2021). Also, learners can efficiently investigate contextual 
dimensions associated with a building project using VR by adjusting specific parameters, making variable control 
simple to achieve (Xu and Zheng, 2020). Young et al. (2021) postulated that VR allows users to have an immersive 
experience of construction projects, so they can react effectively in real-world situations. Because participants in 
a lively, evolving situation can better try to remember safety knowledge, technological tools are also applicable 
in fields such as building risk assessments and safety training (Zhu and Li 2021). 


Amidst the propagation of VR in the built environment, its maximum potential is yet to be realized (Safikhani et 
al. 2020), and there is a significant gap in VR involvement in the built environment between academia and industry 
(Delgado et al. 2020). 
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2.3. Self-Efficacy and Motivation 


Self-efficacy is a belief in one's ability to perform well on a task and is a core concept of Social Cognitive Theory 
(Bandura 1997). Bandura (1997) indicated that self-efficacy is a key determinant of behavioural change, and 
psychological procedures can alter the level and strength of self-efficacy. Self-efficacy is an important factor in 
motivation, achievement, and accomplishment, and can be influenced by a variety of factors, including 
personality, motivation, and the task itself (Bandura 1997). Those with a high level of self-efficacy are not only 
more likely to succeed, but they are also more likely to bounce back and recover from failure (Resnick 2008). 


Schunk (2011) describes how self-efficacy influences choice of activities, effort, persistence, and achievement, 
and how interventions involving models, goal setting, and feedback can affect self-efficacy. Eliyana et al. (2020) 
found that self-efficacy of entrepreneurial students influences achievement, and motivation significantly mediates 
the effect of self-efficacy on entrepreneurial achievement. Rodriguez et al. (2014) found that teachers with 
intermediate self-efficacy perception have more learning-oriented students than teachers with high self-efficacy, 
and students of teachers who are overconfident of their teaching capacity seem to engage less in studying to learn. 
Overall, self-efficacy is an important factor in motivating individuals to achieve their goals. 


It is well documented that self-efficacy has a positive effect on motivation. Kanfer (1990) described motivation 
as the “psychological forces that determine the direction of a person’s level of effort and a person’s level of 
persistence in the face of obstacles”. There are two types of motivation namely: extrinsic and intrinsic motivation 
(Schunk 2011). Ryan and Deci (2000) defined intrinsic motivation as the natural human propensity to learn and 
assimilate, while extrinsic motivation can either reflect external control or true self-regulation. Benabou and Tirole 
(2003) reconciles the economic view that individuals respond to incentives with the psychological view that 
rewards and punishments can undermine intrinsic motivation. Kuvaas et al. (2017) found that intrinsic motivation 
was associated with positive outcomes, while extrinsic motivation was negatively related or unrelated to positive 
outcomes. Overall, intrinsic motivation is driven by internal factors, while extrinsic motivation is driven by 
external factors, and that the two types of motivation can have different effects on outcomes. 


Motivation is crucial in education and can fosters creativity and critical thinking as it cultivates resilience and 
self-assurance, and improves a student’s agency (Schunk 2011). Motivation to learn provides direction, 
enthusiasm, and persistence in learning (Alfiah et al. 2021). Motivation is important for both students and teachers 
in achieving desired outcomes in education (Shrestha 2020). The theories of Maslow's hierarchy of needs and 
expectancy theory shed light on the fundamental aspects of motivation in the context of learning (Shrestha 2020). 
The application of various methods to motivate students and teachers is crucial and should be tailored to specific 
situations and requirements, as there is no universal approach (Shrestha 2020). The motivation of both teachers 
and students holds significance in ensuring an effective teaching and learning process within the field of education 
(Shrestha 2020). 


Low motivation is associated with poor academic performance. Not being motivated was found to be associated 
with higher levels of stress and a lower Grade Point Average (Rticker 2012). Motivation is a key factor in effective 
school functioning and academic achievement (Halawah 2006). Fortier et al. (1995) proposed and tested a 
motivational model of school performance, which found that autonomous academic motivation positively 
influenced school performance. 


Low motivation in education can be caused by various factors. Economic factors, low employment prospects, and 
educational background can contribute to low motivation in college students (Sahib, 2020). Intrinsic factors, such 
as a lack of interest in learning activities and embarrassment, can cause low motivation in elementary school 
students (Alfiah et al. 2021). A lack of interaction between teachers and students, as well as low reading ability, 
can contribute to low motivation in middle school students (Alfiah et al. 2021). O’Neil et al. (1995) found that 
offering financial incentives can increase student effort and improve test scores, suggesting that low motivation 
may be due to a lack of consequences or stakes attached to performance. Therefore, addressing factors such as 
economic status, interest in learning activities, teacher-student interaction, and incentives may help improve 
motivation in education. 


Educators can increase students' motivation in the classroom by employing various strategies. Pahlavannezhad 
and Nejatiyan (2013) found that early knowledge of the course syllabus and assessment, rewards and positive 
reinforcement, and group work and role play can increase students’ motivation to learn English. Brophy (2013) 
suggested that teachers can establish their classes as collaborative learning communities, support their students’ 
confidence as learners, and help them appreciate curricular content as worth learning and applicable to their lives 
outside of school. Williams-Pierce (2011) identifies five key factors impacting student motivation: student, 
teacher, content, method/process, and environment, and provides suggestions from each area that can be used to 
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motivate students. Mart (2011) suggested strategies such as providing a positive learning environment, using 
technology, and incorporating student interests to sustain students’ classroom motivation. 


Motivation has a positive impact on academic performance. Haider et al. (2015) found that both intrinsic and 
extrinsic motivation had a positive impact on students' academic performance. Goodman et al. (2011) found that 
intrinsic and extrinsic motivation were significantly related to academic performance, and that effort mediated 
this relationship. Afzal et al. (2010) also found that both intrinsic and extrinsic motivation had a positive impact 
on academic performance. Fortier et al. (1995) proposed and tested a motivational model of school performance, 
finding that perceived academic competence and perceived academic self-determination positively influenced 
autonomous academic motivation, which in turn had a positive impact on school performance. Overall, these 
papers suggest that motivation is an important factor in academic performance. 


3. METHODOLOGY 
3.1. Research Questions 


As highlighted in the literature review, the implementation of immersive virtual reality (IVR) in built environment 
(BE) requires further research. New research efforts must investigate the impacts that the implementation of this 
technology has on learners’ motivation, and specifically self-efficacy. Furthermore, as highlighted in section 3.2. 
research in the implementation of IVR in BE education necessitates investigations that leverage rigorous 
experimental methods. Based on this research gap, the researchers set out a goal of assessing the impacts that an 
IVR learning activity has on higher education learners in the BE discipline. Therefore, the research was 
concentrated on measuring the impacts of this activity on learners’ self-efficacy and experience. 


Based on these goals, the following research questions were posed: 


1. Does performing design reviews with immersive virtual reality lead learners to have a higher self- 
efficacy? 
2. Is the learners’ experience positive whilst performing design reviews with immersive virtual reality? 


Based on these research questions, two null hypotheses were considered. The first hypothesis was that the learners’ 
reported average self-efficacy was going to be the same before and after the learning activity. Meanwhile, the 
second null hypothesis was that the learners’ average experience was going to be neutral. 


3.2. Experimental Design and Procedure 


To test the hypotheses, the researchers designed a pilot study to test the early impacts of IVR. This pilot study was 
designed as a one group pre-test / post-test quasi-experiment to assess the impacts on learners’ self-efficacy. This 
pilot study is considered a quasi-experiment due to the lack of randomized assignment to the treatment group. 
This method is not optimal for the generalization of the data and the impacts of the learning activity might be due 
to test learning. However, this method is acceptable for running pilot studies and assessing early impacts and 
identifying early trends (Knapp 2016). 


In this pilot study the independent variable or treatment had only one level. This level was the learning activity 
designed to introduce first year BE learners to the concepts of design reviews while using IVR. The dependent 
variable measured for this pilot study was the learners’ self-efficacy. The assessment instrument designed to 
measure the learners’ self-efficacy is explained further in section 3.4. The procedure for this quasi-experiment can 
be seen in Figure 1. The participants were students and took part in this learning activity as part of their class time. 
The students had to participate in the activity; however, they were given the option not to participate in the pre 
and post-tests. A 10-minute pre-test was administered before the start of the activity. The same 10-minute test was 
administered as a post-test after the activity. 
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Figure 1. Quasi-experimental Procedure 


After the pre-test the participants were provided with a tutorial on how to use the IVR head-mounted system. This 
tutorial allowed the participants to get familiar with the virtual environment and navigation controls. This tutorial 
lasted about 10 minutes. Participants that still had difficulty with the controls were provided with additional 
support and instruction. After the tutorial the participants were asked to navigate through a BIM model of a 
residential house and identify design mistakes. This residential house was designed in Autodesk Revit to have 
several design mistakes. The participants were paired in groups of two, where one was assigned the role of “driver” 
and the other of “note-taker”. The driver would wear the headsets, which were casting its video output also to a 
desktop computer screen. The note-taker was tasked to write down design mistakes that the driver was identifying, 
on a sheet provided by the team. For every five design mistakes the participants had to switch roles. The total 
duration of the design review was set to be 30 minutes. This time limitation was also set to limit potential motion 
sickness of the participants. 


3.3. Participants 


The participants in the pilot study were first-year students enrolled in an Introduction to the Built Environment 
module at a university in the south of the United Kingdom. A total of 54 students were enrolled in the module and 
were asked to participate in the study. Ethical approval was received by the university ethic board to perform the 
study, and students were asked for consent to collect the data before administering the pre-test. As mentioned 
earlier, students that did not want to participate in the study were given the option of not answering the pre and 
post-test, but still had to participate in the class activity. 


3.4. Equipment and Assessment instruments 


The research team used twenty IVR headsets, the Meta Quest 2. This allowed a total of 40 students to participate 
in the activity. The video output of the headsets was cast on desktop computer screen over the Chrome web 
browser. To host the design review sessions the researchers used the IVR Arkio® software. Arkio allows users to 
host collocate in a virtual environment and perform design review sessions collaboratively. Learners were 
provided with virtual meeting rooms where the model of the residential house was preloaded. 


The measured dependent variable was the participants’ self-efficacy and experience. The pre-test was composed 
of ten questions. The questions were based on the "General Self-Efficacy Scale" (GSS) developed by Schwarzer 
& Jerusalem (1995). The questions from the GSS instrument were slightly changed to ask the participants about 
their perceived ability to perform design reviews. The post-test was composed of the same ten questions from the 
pre-test as well as an additional eleven questions to capture the learners’ experience. These questions were based 
on the instrument developed by Boekaerts (2002), the OnLine Motivation Questionnaire. The participants were 
asked to indicate their agreement with the statements on a 5-point Likert scale. The questions for the pre and post- 
test can be found in Table 1. 


Table 1. Pre and Post Tests Questions 


Pre and Post Tests Self-Efficacy Questions Post Test Participant Experience 

1. How good do you think you are at reviewing design 11. How do you feel just after finishing the 
proposals? activity? 

2. I am able to review most of the design proposals that I 12. How easy was this activity? 


am presented with. 


3. When facing a difficult design proposal, I am certain that 13. How well do you think you did in this activity? 
I can review it. 
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4. In general, I think that I can successfully review most 4. How useful do you consider this activity in 

design proposals. learning about design reviews? 

5. I believe I can succeed at reviewing most design 5. How important do you find it to do well on 

proposals. design reviews? 

6. I am able to successfully review design proposals. 6. I felt the time used for the activity was 
beneficial. 

7. I am confident that I can perform effectively design 7. I saw the value in the activity. 

reviews. 

8. Compared to other people, I can do most review design 8. How enthusiastic were you about this activity? 

proposals very well. 

9. Even when the design of a building is complex, I can 9. How pleasant did you find this activity? 


review it quite well. 


10. How difficult did you find the topic of design reviews? 20. How much did you enjoy yourself during this 
activity? 


21. How much would you recommend this activity 
to your classmates? 


4. RESULTS AND ANALYSIS 


In this research, the research questions were answered by conducting two different types of statistical analyses. 
The first research question, which aimed to evaluate if the learning activity supported the participants in gaining 
higher self-efficacy, was tested by conducting a Paired-Sample T-Test. The second research question aimed at 
evaluating the participants’ experience while participating in the activity. This question was evaluated by 
performing a One-Sample T-Test. This analysis aimed to test if the participants’ experience was higher than neutral 
towards the positive. The tests were conducted using the statistics software package, IBM SPSS Statistics. A 
summary of the results can be found on Table 2 for the Pre and Post-Test, and on Table 3 for the participants’ 
experience. While a total of 54 participants were recruited, only a total of 29 data points were used for the study 
as the rest either did not participate in activity or did not give permission to use the data. 


Using the results from Table 2, a Paired-Sample T-Test was conducted to test if there was significant difference 
between the pre and post-test for the participants’ self-efficacy. No outliers were detected that were more than 1.5 
box-lengths from the edge of box in a box plot. Participants reported a higher self-efficacy after participating in 
the activity (3.890 + 0.440) when compared to before the activity (3.190 + 0.0.498), a statistically significant 
increase of 0.70 (95% CI, 0.885 to 0.536), t(28) = 8.042, p < .0001, d = 1.493. Therefore, the researchers 
confidently rejected the null hypothesis that the averages for the pre and post-test were going to be the same, as 
indicated by the significant difference and p value being below 0.0001. Furthermore, the effect size of the sample 
is quite large as indicated by the Cohen’s D being 1.493 (Cohen 1988). 


Using the results from Table 3, a One-Sample T-Test was conducted to test if the participants’ experience was 
significantly different than neutral (Likert Scale value of 3). No outliers were detected that were more than 1.5 
box-lengths from the edge of box in a box plot. The mean participants’ experience score was significantly higher 
by 0.998 (0.95 CI, 0.7502 to 1.2205) than a neutral score of 3, t(28) = 8.792, p < 0.0001, d = 0.606. Therefore, the 
researchers confidently rejected the null hypothesis that the average for participants’ experience was going to be 
neutral, as indicated by the significant difference and p value being below 0.0001. Furthermore, the effect size of 
the sample is between medium and large as indicated by the Cohen’s D being 0.606 (Cohen 1988). 


Table 2 — Pre and Post Test Average Answers 


; Pre-Test Post-Test 
Question 


Average Average 
1. How good do you think you are at reviewing design proposals? 3.14 3.90 
2. I am able to review most of the design proposals that I am presented with. 3.43 4.03 
3. When facing a difficult design proposal, I am certain that I can review it. 2.90 3.79 
4. In general, I think that I can successfully review most design proposals. 3.41 4.07 
5. I believe I can succeed at reviewing most design proposals. 3.52 4.07 
6. I am able to successfully review design proposals. 3.28 4.10 
7. I am confident that I can perform effectively design reviews. 3.17 3.86 
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8. Compared to other people, I can do most review design proposals very well. 3.14 3.76 


9. Even when the design of a building is complex, I can review it quite well. 2.97 3.69 
10. How difficult did you find the topic of design reviews? 2.93 3.66 
Self-Efficacy Average 3.190 3.890 

Standard Deviation 0.498 0.440 


Table 3 — Participants’ Experience Average Answers 


Question Average 
1. How do you feel just after finishing the activity? 
1.A How relieved do you feel after the activity? 3.62 
1.B How at ease do you feel after the activity? 3.90 
1.C How nervous do you feel after the activity? 2.97 
1.D How satisfied do you feel after the activity? 4.14 
1.E How worried do you feel after the activity? 3.38 
1.F How confident do you feel after the activity? 3.90 
1.G How concerned do you feel after the activity? 3.34 
2. How easy was this activity? 3.97 
3. How well do you think you did in this activity? 4.00 
4. How useful do you consider this activity in learning about design reviews? 4.24 
5. How important do you find it to do well on design reviews? 4.31 
6. I felt the time used for the activity was beneficial. 4.31 
7. I saw the value in the activity, 4.34 
8. How enthusiastic were you about this activity? 4.07 
9. How pleasant did you find this activity? 4.55 

20. How much did you enjoy yourself during this activity? 4.45 

21. How much would you recommend this activity to your classmates? 4.48 


Participants’ Experience Average 3.998 


Standard Deviation 0.606 


5. CONCLUSION 


The ability to review and evaluate proposed design proposals is a key skill that BE learners must have once they 
graduate from their higher education institution. As discussed in the literature review, several studies have shown 
the positive impacts that IVR has on students’ learning and their ability to meet learning objectives. However, the 
role of instructors is not just about meeting learning objectives, but it is also to support learners in developing 
their self-belief and confidence necessary to enter the industry. 


This research conducted a pilot study to evaluate the effectiveness of using IVR in improving self-efficacy of 
students in engaging in design review thinking skills. The results and analysis have shown that, at a pilot study 
level, an IVR learning activity has potential in impacting students’ self-efficacy while also being a positive 
experience. When looking at the first research question, the learning activity supported students in increasing their 
self-belief that they can perform design reviews while immersed in a virtual reality environment. This pilot study 
can give an early insight into what are the impacts that IVR has in the classroom beyond meeting learning 
objectives, supporting students in their confidence levels. 


The next steps of this research efforts are to mitigate the limitations of sample size by recruiting a larger number 
of participants to support the researchers in scaling the findings and improving the generalization of the results. 
Additionally, one group pre-test / post-test quasi-experiments have experimental limitations and threats to internal 
and external validity. Therefore, the future research can include additional treatments to tackle the validity of the 
results. To address this limitation, the research has already started collecting the impacts a different medium, non- 
immersive VR, has on learners. By adding an additional treatment, a repeated-measure experiment and a two-way 
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mixed ANOVA analysis can be conducted. To conclude, this study reports an initial result of an on-going research 
and the researchers will share further information on the research results in future publications. 
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COGNITIVE DYNAMICS FOR CONSTRUCTION MANAGEMENT 
LEARNING TASKS IN MIXED REALITY ENVIRONMENTS 
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ABSTRACT: Technologies to communicate construction project information (engineering designs, schedules) 
have evolved into a wider range of innovative ecosystems for engineering practices (e.g., cloud-based 3D 
representations and advanced immersive environments). There is a lack of exploration of effective user interaction 
for learning and training in relation to how presented information influences cognition in these ecosystems. The 
presented research investigates the users’ cognitive and attentional differences using the interactive capabilities 
of Mixed reality (MX) technology. The enhanced user-situation interactions are analyzed by measuring cognitive 
dynamics with an emphasis on two processes (attentional focus and cognitive load) in relation to the challenge of 
the engineering learning task— defined by its complexity (limited time frame for observations of the situations, 
number of required observations) and nature (episodic). Cognitive dynamics were measured using an 
electroencephalography (EEG) device that senses electrical activity in response to changing levels of cognitive 
stimuli via electrodes placed on the scalp. Measuring fluctuations in cognitive processing (related to the intensity 
of various task demands) allows associating efforts on semantic information processing for learning and training 
tasks (e.g., walkthroughs for safety checks in job site in MX). The approach enhances opportunities to design 
technology that best adapts to the user needs for engineering practices with an efficient comprehensive 
performance assessment. 


KEYWORDS: Electroencephalography (EEG), Dynamics of attention, Cognitive load, Cognitive processing 


1. INTRODUCTION 


Construction sites are characterized by their dynamic nature, as they are filled with a multitude of activities and 
potential risks. Safety in construction is a critical aspect of production activities and a major priority effort for 
successful implementations in construction organizations (Guo et al., 2017). Construction safety training is of the 
highest priority across the industry, and the use of technology intervention has facilitated such efforts (Frank Moore 
& Gheisari, 2019). The provision of construction safety training plays a pivotal role in cultivating a safety-oriented 
environment within the construction sector. Ensuring optimal safety in the construction industry necessitates a 
collaborative endeavor involving various stakeholders, including owners, designers, construction companies, 
workers, regulators, and educators (Sacks et al., 2013). Typically, prior to commencing work on a construction site, 
workers are mandated to complete an Occupational Safety and Health Administration (OSHA) 10-hour 
construction training program. This program is delivered online and encompasses safety-oriented lectures, videos, 
and slides. 


The efficacy and significance of this training program, as well as its adequacy, are continually pertinent inquiries 
(Wilkins, 2011). There have been efforts focused on implementing a more effective construction safety training 
program using different methods like personalized training programs or training with virtual reality devices 
(Jeelani et al., 2020). However, VR technology implementations generate potential risks for the user. For example, 
they don’t easily enable representing at-scale safety requirements in the VR environments for the users’ own 
exploration in training. Another possibility of risk is that VR applied to OSHA safety training may become a new 
source of distractions to users (Asish et al., 2022), impacting the intended outcome of training. 


Despite the widespread use of technologies in training (including VR and AR as interventions), methods that reveal 
the effectiveness of the technologies as training approaches are not incorporated into the training programs. For 
example, methodologies for assessments employ paper-based exams or supervised self-reports—which have 
considerable limitations— to determine subjects’ performance before and after the training program (Jeelani et al., 
2020). It is critical, therefore, to comprehensively assess the effectiveness of inventory training tasks with the use 
of technology by considering the individual characteristics of the trainee (technology user) as they factor or are 
subrogate into the overall performance. There is a need to find alternative methods to assess the efficacy and 
benefits of implementing safety training interventions due to individual differences and self-report methods' 
disadvantages—ranging from response bias, recall bias, and subjectivity to cultural and language barriers. The 
researchers anticipate that incorporating the users’ individual performance front and center might facilitate a 
smooth path to successful training programs. 
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The presented work uses Mixed Reality (MX) technologies. MX combines real and virtual worlds for the creation 
of environments where users can function and interact in the physical and virtual worlds. MX has facilitated the 
consolidation and analysis of activities in the physical space, such as in production processes involved in the 
manufacturing of goods or services—1.e., the activities for converting raw materials into finished products. The 
co-existing of real and virtual interaction allows connections and reactions between virtual objects and the physical 
space, undoubtedly facilitating the enhancement of the study of construction activities, including training programs 
in the industry. For example, by “moving” the construction site production activities into just a small physical 
space for training. 


This study presents a novel assessment method utilizing electroencephalography (EEG) technology. The EEG is 
used to measure electrical activity in the brain, which output data brings insights into the timing and nature 
(rhythms in brain activity across frequency bands— delta, theta, alpha, beta, and gamma) of the underlying 
cognitive processes. The presented research proposes the utilization of the EEG technique for safety training and 
problem-solving tasks. The method enables the automatic collection of data from individual trainees to study the 
effectiveness of training tasks. The approach collects the neural response when learners attempt to address 
challenges in training tasks under conditions of complexity, such as the cognitive effort involved in solving a 
question. By conducting an analysis of the EEG data, this technique provides information to assess the training or 
problem-solving tasks through cognitive load and attention degree analysis for individuals, providing information 
on which task the user performs well or has deficits. The outcome can easily correlate to individual differences 
(cognitive abilities such as memory, attention, perception, and problem-solving skills) to find the effectiveness of 
the overall training tasks. For example, what is the impact of attention deficits on particular training tasks? 


The presented study focuses on the problem-solving process in order to develop a more effective, precise, and 
unbiased approach to assessing performance in training. 


2. BACKGROUND 


Since the early 20th century, scientific studies using electroencephalography techniques have experienced a 
considerable evolution. Mainly these efforts involve the detection and analysis of minuscule electrical signals 
emanating from the human brain during its various activities (Sanei & Chambers, 2007). Electroencephalogram 
(EEG) signals are categorized into distinct power bands corresponding to various brain wave frequencies, 
facilitating the identification of different states or conditions of ongoing brain activity in humans (see Table 1) 
(Fernandez Rojas et al., 2020; Klimesch, 1999; Zietsch et al., 2007). 


Table 1: Frequency and statements of brain waves. 


Brain Waves Frequency (Hz) Statement 
Delta 0.5-4 Idling and sleep 
Theta 4-8 Mental fatigue and mental workload 
Alpha 8-13 Mental workload, cognitive fatigue, and attention or alertness 
Beta 13-30 Visual attention, short-term memory, and working memory 


The literature has established the meaning of spectral powers of various EEG waves and cortical locations in 
evaluating cognitive load during problem-solving tasks. Researchers observed an increase in the power of both 
theta and alpha bands as task difficulty escalated, suggesting a direct association between these bands and cognitive 
load (Sarailoo et al., 2022). More specifically, the augmentation of theta spectral power serves as an indicator not 
only of heightened task complexity but also of enhanced working memory capacity (Borghini et al., 2012). 
Additionally, the beta band can potentially serve as another indicator of cognitive load and working memory during 
tasks. In visual working memory tasks, there has been an observed augmentation in beta activity within the parieto- 
occipital channels (Mapelli & Ozkurt, 2019). 


Building on early definitions of attention from William James, in his book the Principle of Psychology, James 
states that attention “is the taking possession by the mind, in clear, and vivid form, of one out of what seems several 
simultaneously possible objects or trains of thought.”(James, 1890). As a condition of selective awareness, 
attention degree controls the quality of one's task-solving. Enhancing one's ability to regulate attention pertains to 
the domain of executive attention, also known as controlled attention. This cognitive process encompasses 
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functions such as planning, decision-making, and problem-solving (Fernandez-Duque et al., 2000). Executive 
attention refers to the cognitive ability to deliberately redirect one's focus from one task to another or to inhibit the 
processing of extraneous information. This study focuses on the focus degree, as known as the intensive of 
attention, as one of the layers of information that help with analyzing performance. To indicate the state of attention 
degree, delta, theta, and alpha waves are the most used (Kaushik et al., 2022). 


Multiple studies have documented an elevation in mid-frontal theta activity, a reduction in central and parietal 
delta activity, and a decrease in frontal and parietal alpha power during states of attention (Kaushik et al., 2022). 
Additionally, the relative magnitudes of spectral power across various waveforms can serve as an indicator of 
attention levels. As per the findings of the researchers, the attention ratio, referred to as the theta/beta ratio, 
possesses significant utility as an indicator for the analysis of attention. Moreover, it has been demonstrated by 
researchers that a robust association exists between attentional degree and the ratios of theta/beta, theta/alpha, and 
alpha/beta (Derbali & Frasson, 2011; Ghasemy et al., 2019; Hillard et al., 2013). More specifically, a larger ratio 
of alpha/beta indicates a more concentrated situation, in the meantime, there is a negative correlation between the 
ratio of theta/beta and the focus degree (Derbali & Frasson, 2011). 


While EEG is commonly associated with medical and neuroscience applications, it also has some interesting and 
potential applications in the field of construction. Applications like worker cognitive load or stress level monitoring 
could help to improve construction on-site workers’ health, well-being, and productivity (Jebelli et al., 2018; Saedi 
et al., 2022). The productivity of construction workers is not solely determined by their individual workload but is 
also greatly impacted by their emotional state, particularly when encountering hazardous work conditions or 
confined spaces. The utilization of wearable EEG headsets for monitoring the emotional state of on-site 
construction workers is a potential avenue for construction managers to enhance control and optimize the overall 
workflow of building projects (Hwang et al., 2018). 


Given that construction workers consistently operate under conditions of high stress and heavy workloads, the 
matter of safety is a critical domain that researchers seek to enhance. The studies on the EEG in the construction 
site may lead to the optimization of the construction safety programs. For example, assessing the on-site worker’s 
mental workload via EEG could help managers identify individuals who are not in their best mental status and 
better arrange human resources to reduce risk and hazards (Chen et al., 2016). On the other hand, ensuring that 
personnel remain focused on their hazardous tasks and that they are not easily distracted by external factors is 
consistently crucial. Wearable EEG devices promise to identify factors of distractions of construction workers in 
hazardous tasks and to improve construction site safety (Ke et al., 2021). 


In addition to calls for its application in on-site construction workers, EEG has been utilized in laboratory studies 
based on virtual reality (VR) to enhance the performance of building or construction environments. The utilization 
of virtual environments offers a valuable opportunity to replicate real-world scenarios. By incorporating EEG band 
power scalp mapping into machine learning models, it becomes possible to assess the authentic responses of 
individuals residing in a building space. This analysis can encompass several aspects, such as comfort, pathfinding, 
and spatial utilization (Zou & Ergan, 2023). Beyond the analysis of the fatigue level from EEG signals collected 
in the virtual environment, collected EEG data can help the development of new models to improve the prediction 
and prevention of construction fall hazards (Tehrani et al., 2022). 


3. METHODOLOGY 


The presented approach is a model for the performance and assessment of individual trainees’ problem-solving 
tasks implemented in an MX environment combined with an EEG headset. The MX environment will provide a 
virtual simulation of the training tasks, and the EEG headset will collect EEG signals for cognitive analysis on the 
problem-solving task. This research work employs the Theta (4-8Hz), Alpha (8-13Hz), and Beta (13-30Hz) 
frequency bands primarily to assess the cognitive load and attention levels exhibited during a problem-solving 
task. 


Subsequently, the provision of performance feedback entails the comprehensive processing of all collected data. 
The flow of the feedback methodology process is presented in Fig.l. Detailed steps will be discussed in the 
following sections. 
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Fig.l: Workflow of the problem-solving task performance feedback process. 


3.1 Subjects’ recruitment and OSHA safety training 


The experiments involve recruiting a sample of fifty individuals aged between 18 and 35 years old, who are 
enrolled in a university program with a background in civil and construction engineering. The experimental 
procedure will be conducted within a laboratory area measuring 5 meters by 5 meters. 


Upon enrollment in the experiment, participants are required to respond to a pre-test that will inform the knowledge 
of OSHA construction safety training. After the pre-test, subjects are required to watch the OSHA construction 
safety training video and then take the OSHA construction safety examination in a virtual environment after 
becoming familiar with the manipulation of the MX device. 


The OSHA construction safety training video refers to the selected OSHA construction regulations (Huang et al., 
2003; "Top 10 Most Frequently Cited Standards | Occupational Safety and Health Administration,"). Fig. 2 is an 
example of the applied OSHA construction standard. 


1926.451(e)(1) 
When scaffold platforms are more than 2 feet (0.6 m) above or below 
a point of access, portable ladders, hook-on ladders, attachable 
ladders, stair towers (scaffold stairways/towers), stairway-type 
ladders (such as ladder stands), ramps, walkways, integral 
prefabricated scaffold access, or direct access from another scaffold, 
structure, personnel hoist, or similar surface shall be used. 
Crossbraces shall not be used as a means of access. 


Fig.2: Example of an OSHA Construction Safety standard applied in this study. 


The current approach classified the selected construction OSHA standards for violation identification in the MX 
environment scene into three tiers. The tiers are designed based on the complexity of violation identification, 
meaning the user’s level of effort required—i.e., the complexity is related to the steps to determine the violation 
in a virtual scene. Table 2 presents the details of complexity tiers for violation of OSHA standard identification. 
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Table 2: Tiers and level of effort on OSHA standard violation identification in the virtual scene. 


Tiers Level of effort based on complexity of tasks for violation identification 
Tier 1 Direct visual contact with the objects. 
Tier 2 Need to search for information to infer a violation. 
Tier 3 Need to perform actions in to determine violations 


Tier 1 is applied to those violations that can be perceived through direct observation in scenes within the MX 
environment—1.e., there is minimum learner’s effort to perceive the visualizations (visual representations) that 
produce the stimulus for the learners’ identification of the violations. Within this tier, most of the violations could 
be identified by just a single observation of the virtual objects (visual representations). Typical examples for this 
tier are conditions that represent personal protection equipment through virtual objects (i.e., visual representation 
of the worker without properly using protection equipment). Tier 2 is for conditions in the scene that demand the 
user to search for information to infer the violations. The learners’ search for information is possible by triggering 
actions in the MX environment. The learner’s action for information search serves as an additional mediation 
mechanism for inference to determine violation (e.g., the learners’ virtual rotation of an extinguisher in the MX 
scene to determine the expiration date). Tier 3 complexity consists of additional actions for inference using virtual 
instruments to determine violations. Using instruments for inference implies an additional level of complexity, as 
other cognitive capabilities (spatial and reasoning abilities) are required to determine violation (e.g., displacing 
instruments to measure distances in the virtual space). An example of Tier 3 is the learner’s required action of 
using an instrument to calculate the distance that informs whether it’s a violation or not (e.g., the placement of a 
straight ladder against a wall). 


3.2 Problem-solving task design in the MX environment 


The design consists of creating an examination of on-site construction OSHA safety checks. The examination is 
framed as a problem-solving task to draw boundaries of complexity in the search space— the possible 
configurations, number parameters, and elements in the MX environment that impact the users’ decisions and 
courses of action. 


Users (learners, trainees) are required to review the compliance of safety standards of construction of small 
commercial buildings (3 stories distributed over 40,000 Sq ft) as a project engineer from a local subcontractor 
company. While wearing the MX device (Hololens 2), users are asked to inspect and label safety standards 
violations in the first story of the building within 10 minutes. Fig. 3(a) shows the MX ongoing construction site 
prototype. Fig. 3(b) shows a scene where the user is required to use an additional instrument within the MX 
environment to determine a violation, like the measuring tape within the problem-solving task. 


(b) Integrated virtual commands for labeling virtual 
objects. 


(a) Virtual construction site. 


Fig. 3. Virtual environment for OSHA training. 


In order to facilitate unrestricted exploration of the entire construction site within a virtual environment, a 
navigation system was devised to enable virtual movements in the virtual world based on physical displacement 
in the real world (laboratory space). The navigation system allows unconstraint virtual displacements in the MX 
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environment. Due to the constraints on mobility in the physical environment, individuals are required to physically 
traverse the laboratory area in order to investigate and examine the construction site comprehensively. Instead of 
taking the examination totally virtually like a video game, this paper is trying to configure a balance between 
virtual and reality through this navigation system. 


3.3 Data collection 


The experiment design includes the collection of video, audio, and EEG data from a MX device and an EEG 
headset (see Fig.4(a) and 4(b)). The EEG data is collected from an OpenBCI Mark IV headset with a sampling 
rate of 125 Hz. The OpenBCI Mark IV headset includes 16 channels (with additional reference and ground 
electrodes in Al and A2) placed on the subject’s scalp according to the international 10-20 system. This paper 
collects EEG signals by using all 16 channels (FP1, FP2, F3, F4, F7, F8, C3, C4, T7, T8, P3, P4, P7, P8, Ol, and 
O2) as presented in Figs. 5(a), 5(b), and 5(c) (Homan et al., 1987; "Ultracortex Mark IV | OpenBCI 
Documentation,"). 


(a) Mixed-reality and EEG device. (b) Subject wearing the combined headset. 


Fig. 4: MX with an EEG device integration. 
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(a) Used channels in 10-20 system. (b) Channels numbering. (c) EEG heat map while example 
subject ongoing data collection. 


Fig. 5: Map of electrodes used in this paper’s EEG data collection process ("Ultracortex Mark IV | OpenBCI 
Documentation,"), and heat map example for EEG signal processing. 


During the implementation of the OSHA construction safety inspection, the researchers collect video, audio, and 
information on tagged violations by the subjects. All tagged violations will include the real timestamp information, 
which could help with synchronizing time stamps across different data streams and activities in the MX 
environment. 
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3.4 Post-processing 


A synchronization task follows. It consists of mapping the generated streams of data collected during the 
experiment with the real-time stamps associated with each technology device. The timestamps were used to 
identify the exact occurrence of violations during the experiments. Once violation occurrences are identified, the 
EEG data will be segmented into 20-second epochs for target sections (20 seconds before each incident violation 
was labeled or tagged). The researchers post-process EEG data in MATLAB (version 2023a) and the EEGLab 
toolbox (Delorme & Makeig, 2004). After importing raw EEG data and electrodes’ locations correspondingly, a 
FIR filter is applied to bandpass filter the EEG data to a frequency of 0.5-30 Hz to help to remove low-frequency 
drifts and power line noise. Artifacts from eye blinks and muscle movement were corrected by applying ICA in 
EEGLab (Winkler et al., 2014). 


The next steps are the extraction of features of theta (4-8Hz), alpha (8—13Hz), and beta (13—30Hz) to analyze and 
compare the mental cognitive load and attention degree across different levels of efforts. 


4. EXAMPLE OF DATA RESULTS AND DISCUSSION 


The current experimentations of the treatment and control groups are in progress. The following is an example of 
the typical experiment and data captured for one subject, including the data processing outcomes for the treatment 
group. 


The presented example shows an experiment with a subject of the treatment group who has never taken any OSHA 
construction safety-related training. The researchers asked the subject to take a personalized OSHA safety training 
session. Once the training session was finalized, the researchers asked the subject to be immersed in an MX reality 
ecosystem by wearing the MX and EEG devices. The immersive environment consists of a virtual construction 
site with multiple scenes and situations that present safety violations and hazardous conditions based on OSHA 
standards. Each violation fell into three different tiers of complexity. As an illustration, the violations’ type and 
complexity tier are presented in Fig. 6. 


Violation Action to determine Violation Action to determine ¥ Action to determine 
category the violation category violation (step 1) violation (step 2) 
rooms —m 
protection et 
equipment FER) piii <i -p 
itu "4 on. 
nF: 
ww 
The hard hat should 
be worn all the time 
on the job site. 


(a) Easy-level complexity violations. 


Violation Action to determine 
category i violation (step 2) i violation (step 3) l 


Each suspension 
rope shall be capable 
of supporting at least 

6 times the 
maximum intended 
load transmitted to 

thal rope. 


(c) Hard-level complexity violations. 


Fig. 6. Example of complexity tier violations used in the experiment. 
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The easy-level complexity violations (like PPE violations) would only require subjects to visually be in contact 
with the virtual object to check and determine the violation. For mid-level complexity violations (like mobile 
scaffolding capacity violation), the subject needs first to be in contact with the virtual object (as a target virtual 
object with potential violation)—next, the subject searches for information to make inferences and verify a 
violation. For hard-level complexity violations (like suspension scaffolding capacity violations), the subject 
requires not only the awareness of an information search task to deduce a violation but also be involved in a 
reasoning task. The reasoning task demands constructs a free-body analysis of virtual objects. 


Fig. 7 (a) shows the result of cognitive load when the subject was experiencing different efforts associated with 
the tiers of complexity violations. As introduced in the previous sections, to analyze the cognitive load, the relative 
band power of Alpha and Theta was computed (Sarailoo et al., 2022). Based on the result, it can be concluded that 
to address the increasing complexity of tasks, individuals are required to exert a higher cognitive load to arrive at 
a solution. Furthermore, since the subject's cognitive load increased with the higher complexity levels when 
solving a task, it is possible to determine the dynamics of success and failure of each subject (cognitive dynamics 
of each subject). The dynamic enables the research to correlate the efficiency of the technology for training and 
individual differences in training tasks. 
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(f) Attention degree (ratio between theta and 
beta). 


Fig. 7. Data from cognitive processes from one subject involved in MX training task. 


To analyze the attention degree when subjects address solutions in problem-solving, the band power of mid-frontal 
theta, central and parietal delta, and frontal and parietal alpha were computed. Besides, the ratio between theta and 
beta was also included as an indicator (Derbali & Frasson, 2011; Kaushik et al., 2022). As presented in Fig. 7 (b) 
(c), with the increase of the task complexity, the related power band of the mid-frontal theta increases, and the 
central and parietal delta decreases, which indicates a better intense of attention was put into the problem-solving 
task. However, the relative band power of frontal and parietal alpha exhibited a little increase as the task complexity 
escalated, indicating a potential decline in attention levels during the problem-solving activity. Besides, the ratio 
between theta and beta was also not presented as the ideal model. This result may be caused by lost calibration or 
bad connections between some channels of EEG collecting headset and subject’s scalp. Drawing precise reasons 
on this issue is challenging due to the insufficient number of trials and subjects involved. 
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Nevertheless, by analyzing the cognitive load condition and the attention degree while the subjects were facing 
the problem-solving task, the researchers could infer that some mistakes made by the subject were not because of 
the inefficiency of the training program but due to loss of attention or the lack of ability to keep on a high cognitive 
load level for a long time. In this example, the subject stayed focused during the whole session and solved all three 
violations (presented in Fig.6) successfully. However, the presented example contains a short number of decisions 
with only simple construction scenes and a limited number of violations. It’s relatively easy to keep focused on 
the problem-solving task. For subjects who face a more complex scene and can’t solve all violations, the reason 
for the mistakes (i.e., performance on correct inferences to solve the problem) will become part of the analysis, 
including the effectiveness of the MX intervention for the training program. 


5. CONCLUSION AND FUTURE WORK 


The presented research is a successful design and development of a method for the effectiveness of assessment 
safety training. The approach includes in its development the design and construction documents of the 
construction project site to build a virtual construction site. The information was used to build an MX environment 
for the learner’s self-exploration using a navigation system, enabling the learner to mimic the real workplace with 
the advantage of a mixed-reality device. The method uses EEG signals to estimate cognitive conditions that inform 
the users’ effort in decision-making while solving a problem relating to OSHA violation (i.e., virtual safety 
inspections in the MX environment). The technology consists of an MX device and a 16-channel EEG headset. 
Subjects could walk freely in the experimental space, as the portable EEG and mixed-reality devices allow them 
to collect the data wirelessly to a local network set for the experimentation. With the model developed, the 
researchers could successfully and accurately assess the subject’s cognitive load and attention levels while solving 
the construction safety-check problem. The outcome provides new and comprehensive information that helps to 
analyze the performance during learning and problem-solving. With the developed method, the researchers could 
overcome the bias from self-report evaluation or any paper-based test and get a comprehensive and personalized 
performance analysis. For future work, the researchers will model the effects of the cognitive load and attention 
degree analysis by applying machine learning algorithms for inference on the subjects’ behavior during problem- 
solving tasks. 
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ABSTRACT: Due to the practice-oriented nature of construction engineering education and barriers associated 
with physical site visits, videos are invaluable means to expose students to practical curricula content. Prior 
studies have investigated various design principles of multimedia pedagogical tools to enhance student learning 
and reduce cognitive load. These design principles and computer vision techniques can afford the design and 
usage of a multimedia learning environment with annotated content to teach students construction safety 
practices. Hence, using subjective and objective measures such as self-reported cognitive load, eye tracking 
metrics and verbal feedback, this study assesses the effectiveness of a computer vision-aided multimedia learning 
environment as well as examines variations across students’ demographics. Students were exposed to both 
annotated and unannotated versions of the learning environment. The annotated version of the learning 
environment was considered more effective in triggering students’ attention to learning content, but higher 
cognitive load levels were reported by participants. The same demographic groups that dwelled longer and on 
more annotated areas of interest also reported higher overall cognitive load. Keeping with individual differences 
principle of multimedia learning, demographic variations in participants' cognitive load and effectiveness of the 
learning environment were reported. The study provides implications for instructors in construction engineering 
programs on effective use of computer vision-aided annotated videos as instructional materials. This study could 
serve as a benchmark for future studies on artificial intelligence techniques for signaling in multimedia learning. 
This study reveals the affordances of computer vision-aided multimedia learning in construction engineering 
education and the need for adaptation of multimedia learning tools to students’ demographics. 


KEYWORDS. Computer vision, construction engineering education, demographic differences, multimedia 
learning, video. 


1. INTRODUCTION 


Construction-related disciplines are applied science; hence they are rich in practical components which are usually 
difficult for instructors to cover in the classroom (Gunhan, 2015). The imbalance between theory and practice has 
been one of the challenges in preparing students for the workplace (Afonso et al., 2012). Hence, academia is in 
constant effort to achieve a proper blend of theory and practice (Bozoglu, 2016). Site visits are being used to 
circumvent this challenge by exposing students to real-world examples, spatio-temporal scenarios of construction 
operations and interaction with practitioners (Eiris Pereira & Gheisari, 2019). However, barriers associated with 
site visits such as safety, coordination, distance/location, limited what-if scenarios, and concerns for disabled 
students(Eiris Pereira & Gheisari, 2019) have necessitated the need for new methods of bringing practical 
examples into the classroom. Videos are now increasingly being widely used as pedagogical tools to address these 
limitations (Shojaei et al., 2021). Videos enable instructors to bring the real world into the classroom. Videos also 
allow for experiments, site visits, and demonstrations that otherwise would have been impossible. Beyond 
knowledge transmission, videos expose students to diverse experiences, attitudes, and emotions and promote 
interactions and discussions (Ferreira et al., 2013). However, the use of videos also comes with some challenges. 
For example, if not intelligently designed, videos could be ineffective for learning because they could increase 
cognitive load, and not capture learners’ attention (De Koning et al., 2009). In addition, videos could contain non- 
essential information which could be distracting to learners. These downsides of videos have been earlier reported 
(Homer et al., 2008). To circumvent these challenges, the adoption of multimedia learning principles such as 
removal of extraneous content and signaling of important learning content are effective measures (Mayer & 
Fiorella, 2014). Signaling involves the use of cues (e.g., arrow, boundary boxes, color contrast) to point out 
important learning content to learners in a multimedia environment. In other domains, previous studies have 
demonstrated the effectiveness of these techniques in ensuring that videos are effective pedagogical tools (De 
Koning et al., 2009; Navarro et al., 2015). 


Given the advances in computing, its affordances and wide applications, manual signaling methods which could 
be laborious, undynamic and time intensive can be replaced with automation afforded by artificial intelligence. 
By leveraging computer vision (CV) techniques (such as object and interaction detection), construction videos 
can be automatically annotated to call out specific learning contents. This has been demonstrated in other 
endeavors such as detecting human daily activities using convolutional neural networks (Zhang et al., 2017). 
Adopting this in construction engineering education is important given the need to visualize theoretical concepts, 
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understand the pace and sequence of construction tasks, and spatiotemporal nature of construction activities (Eiris 
Pereira & Gheisari, 2019). Previous studies (Abdulrahaman et al., 2020; Stark et al., 2018) have highlighted the 
need to evaluate the efficacy of multimedia learning environments. To evaluate multimedia pedagogical tools, 
learners’ cognitive load level and demographic differences have been suggested as primary considerations 
(Grimley, 2007). Also, earlier studies have combined objective and subjective measures and compared two or 
more multimedia learning environments (Abdulrahaman et al., 2020; Stark et al., 2018). Despite the potential of 
multimedia learning in construction engineering education, it has received little attention in literature, especially 
the application of CV in multimedia learning. Hence, this study compared two multimedia learning environments 
from the lens of demographic differences to evaluate the effectiveness of CV-aided multimedia learning in 
construction engineering education. 


2. BACKGROUND 
2.1 Application of Computer Vision in Education 


Computer vision is being increasingly leveraged in several educational contexts because of its value to 
conventional educational methods by improving teaching and learning (Savov et al., 2018; Sophokleous et al., 
2021). For instance, using a face recognition algorithm, Savov et al. (2018) combined computer vision, and 
internet of things to provide engaging experience for students. The study demonstrated the efficacy of computer 
vision through adaptation to learners’ facial expression to help instructors tailor the teaching process to learners’ 
preferences. Tetiana et al. (2021) also leveraged computer vision and augmented reality to allow students to 
interact with and obtain additional virtual information about research objects. This helped to promote effective 
interaction between students and educational material. Similarly, using facial emotions, pose estimation, and head 
rotation, Poonja et al. (2023) developed a computer vision-based system to detect students' engagement in online 
learning. Computer vision has been adopted in other educational context such as enhancing the teaching of 
mechatronics (Tudić et al., 2022), in distance education of new generation labor productivity (Zhao & Li, 2021), 
as well as educational robotics in K-12 education (Sophokleous et al., 2021). Interaction and object detection 
techniques of computer vision can be leveraged to signal essential learning contents in videos for teaching 
purposes. For example, Tang et al. (2020) used Faster Region-proposal Convolutional Neural Network (Faster 
RCNN) to detect workers and materials for safety monitoring. Using deep residual learning network, Hashimoto 
et al. (2019) used computer vision for automated operative step detection during Laparoscopic Sleeve 
Gastrectomy. Similarly, Aronson (2018) demonstrated the efficacy of computer vision for signaling violation of 
human right in videos. However, there are scarce studies that demonstrated the effectiveness of computer vision 
for signaling in multimedia learning. 


2.2 Evaluation of Multimedia Learning Tools 


To evaluate multimedia learning tools, a comparison approach which involves comparing two or more multimedia 
pedagogical tools is a common method (Abdulrahaman et al., 2020). For example, Chiu et al. (2018) compared 
the efficacy of annotated and unannotated versions of a video to teach cardiopulmonary resuscitation. Using pre 
and posttests, eye tracking metrics, satisfaction and self-reported cognitive load questionnaires, the study reported 
that students that learned with annotations had lower cognitive load, concentrated more on the critical parts of the 
instructional video, and thus learned more effectively and easily. Also, combinations of objective and subjective 
measures have been encouraged in usability evaluation (Abdulrahaman et al., 2020). This is due to the limitations 
of subjective measures such as risk of prejudice, lack of response, and lack of supporting evidence for respondents' 
ratings (Kelley et al., 2003). Objective measures such as eye tracking metrics (e.g., fixations and dwell times) are 
widely used in the usability evaluation of multimedia learning tools (Molina et al., 2018; Stark et al., 2018). 
National Aeronautics and Space Administration Task Load Index (NASA-TLX) is a widely used subjective 
evaluation tool for assessing cognitive workload. NASA TLX assessed cognitive workload across six subscales: 
Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration (Sharek, 2011). Eye 
Tracking and NASA TLX have been used in previous studies (Latifzadeh et al., 2020; Law et al., 2010) for the 
evaluation of multimedia pedagogical tools. Other subjective measures such as think-aloud protocol, interview, 
and verbal feedback are also being used (Abdulrahaman et al., 2020). Also, to evaluate the usability of multimedia 
pedagogical tools, demographic differences such as gender (Grimley, 2007), academic level, academic program 
(Castro-Alonso et al., 2019), ethnicity (Moreno & Flowerday, 2006) and prior experience (Kalyuga et al., 2000) 
are deemed important considerations. 
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3. METHODOLOGY 


3.1 Overview 


The study evaluated the efficacy of an annotated video designed to teach construction students different 
construction safety practices. A comparison approach was adopted. Students were exposed to two learning 
environments. One was designed with computer-vision aided signals or cues to call out essential learning content 
while the other was not. Subjective and objective measures of the participants in the two learning environments 
were compared. In addition, keeping with individual differences principle of multimedia learning, demographic 
variations in participants' cognitive load and eye tracking metrics in the annotated learning environment were 


assessed. 


Annotated Learning 
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O 


Comparison of Learning 


ronm en 


Unannotated Learning 
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Fig. 1: Overview of Methodology 


3.2 The Learning Environments: Annotated and Unannotated Videos 


The two versions of the video had the same contents, each about 6-minutes long. The videos contain both visual 
and audio presentations of eleven (11) construction safety practices. These include Personal Protective Equipment 
(PPE), Situational awareness (SA), Secure workers at height (SWH), Securing materials (SM), Exclusion zone 
(EZ), Deep excavation (DE), Signage (S), Fall protection (FP), Mobile phone use (MPU), Ladder use (LU), 
Ergonomics (ER). These safety practices were identified by Olayiwola, Yusuf, et al. (2023) as critical to 
complement classroom teaching during site visits. The safety practices represent areas of interest (AOIs). Safety 
practices were chosen because Pedro et al. (2016) highlighted the need for knowledge of safety practices in 
preparing the future workforce. The annotated video contains computer vision-aided signals to call-out the safety 
practices while the unannotated video does not. Both object and interaction detection techniques of computer 
vision were used in this study. Faster RCNN, a deep learning technique was combined with Visual Geometry 
Group network (VGG16) (a convolution neural network architecture) to signal the construction safety practices 
with boundary boxes. Visual translation embedding network (VTransE) was used as the interaction detection 
technique. The details of the design and development of the annotated learning environment are presented in 
Olayiwola, Akanmu, et al. (2023). Examples of frames from the annotated and unannotated videos are shown in 
Figure 2. 


3.3 Participants and study approval 


After the Virginia Tech Institutional Review Board approved the study, thirty-five (35) participants who are 
students in construction-related programs volunteered to participate. Twenty (20) of them used the annotated 
learning environment while fifteen (15) used the unannotated learning environment. All the participants are 
between 20 to 24 years old. The participants' demographics are shown in Table | below. 
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(a) Sample frame of annotated video (b) Sample frame of unannotated video 
Fig. 2: Frames of Annotated and Unannotated Videos. 


Table 1: Participants’ Demographic Information. 


Demographics Experimental Group (N=20) Control Group (N=15) 
Gender 
Male 12 12 
Female 8 3 


Academic Program 


Building Construction (BC) 3 3 

Construction Engineering and 9 10 
Management (CEM) 

Civil and Environmental Engineering 6 - 

(CEE) 


Academic Level 
Junior 4 10 
Sophomore 16 5 


Years of construction experience 


Less than 2 13 11 

2-5 T 4 
Ethnicity 

White/Caucasian 12 9 

Asian > 3 

African American 2 2 

Hispanic/Latino 1 1 


3.4 Experimental design and Data collection 


Before the experiments commenced, every participant was intimated with the workflow of the experimental 
procedure. Thereafter, the participants signed the informed consent form and completed the demographic 
questionnaire. A web-based eye tracker (Gaze-recorder) was used in this study. The participants' eyes were 
calibrated, and then they watched the 6-minute video of construction safety practices. Eye-tracking data was 
collected as the participants watched the video. Two separate experiments were conducted. In the first experiment, 
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the experimental group (n = 20) was exposed to the annotated video while in the second experiment, the control 
group (n = 15) was exposed to the unannotated video. After every session, participants completed three subscales 
of the NASA-TLX questionnaire (i.e., Mental demand, Effort, and Frustration). These subscales have been used 
to assess cognitive load in multimedia learning (Refat et al., 2020). Finally, the participants’ verbal feedback was 
audio-recorded. Each session lasted for about one (1) hour. 


3.5 Data analysis 


Dwell times of the participants were collected for the eleven AOIs in the videos. Wilcoxon Rank Sum Test was 
used to test for statistically significant differences since the comparisons were between two independent groups 
and independent observations. Descriptive statistics were used for the self-reported cognitive load. MS Office 
Excel and SPSS were used for the analysis. The verbal feedback was transcribed and analyzed using cluster 
analysis. The analyses were done by comparing the participants’ dwell times on the AOIs of the annotated and 
unannotated videos and the demographic differences of the participants. 


4. RESULTS AND DISCUSSION 


4.1 Dwell Time Comparison in Annotated and Unannotated Video 


As shown in Figure 3, significant differences in the dwell times were found between the annotated and 
unannotated videos for seven (7) AOIs of which four (4) had significantly higher dwell times in the annotated 
video. These include PPE, Situational Awareness, Securing Material, and Signage (p < 0.05). The overall dwell 
time of the participants was higher in the unannotated environment although no statistically significant difference 
was observed. However, the participants dwelled longer on more of the AOIs in the annotated video. The 
participants dwelled longer on 7 out of the 11 safety practices in the annotated video. In the unannotated video, 
the participants only dwelled longer on four (4) safety practices, which include Secure workers at height, Fall 
Protection, Use of Mobile Phone and Ergonomics. This result shows that although the participants spent more 
time in the unannotated environment, they did not dwell longer on more AOIs. Whereas the participants spent 
less time in the annotated environment but dwell longer on the AOIs. This shows that the annotation was effective 
to direct the learners to important learning content and to stimulate their interest. This finding agrees with Molina 
et al. (2018) who explained that learners might dwell more on AOIs in multimedia learning. The higher dwell 
times on the AOIs in the annotated video shows the extent of focus and interest in the AOIs (Bojko, 2013; Carter 
& Luke, 2020). This result aligns with the findings of previous studies (Molina et al., 2018; Navarro et al., 2015) 
which underscored the potential of signals to trigger learners’ interest, improved learners’ visual search efficiency 
and provide greater visibility for important learning content. 
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Fig. 3: Dwell Time on AOI 
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4.2 Comparison of Demographic Differences within Annotated Video 
4.2.1 Gender 


Within the annotated environment, gender differences were observed, as shown in Figure 4. On the overall, female 
students dwelled longer on the AOIs than their male counterparts. The overall dwell time of the female students 
was statistically higher (p < 0.05). Also, the female participants dwelled more on each of the AOIs than their male 
counterparts. This phenomenon reveals that signals were efficacious in drawing the attention and stimulating the 
interest of female students than male students. The female participants’ dwell times on the AOIs were significantly 
higher for three (3) AOIs, which include Secure Workers at Height, Deep Excavation and Use of Mobile Phone. 
This could mean that female students prefer to learn with annotated videos than male students. This could be 
helpful to instructors in their choice of instructional materials. This finding contributes to prior studies (Grimley, 
2007; Saha & Halder, 2016) which have shown gender differences in information process in multimedia learning. 
The finding agrees with Dousay and Trujillo (2019) who reported that females had higher situational interest in 
multimedia learning than males. 
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Fig. 4: Male and Female Dwell Time for Annotated Video 
4.2.2 Academic Level 


On the overall, Figure 5 shows that the senior-level students dwelled longer (although not significant, p>0.05) on 
the annotated video than junior-level students. Also, the senior-level students dwelled more on nine (9) out of the 
eleven (11) AOlIs than their Junior-level counterparts. The Senior-level participants dwelled more on all AOIs 
except Use of Mobile Phone and Ergonomics. The results showed that the Senior-level participants significantly 
dwelled longer (p<0.05) on Exclusion Zone (p<0.05). The academic level of the students could be synonymous 
with prior knowledge. Although, previous studies (Grimley, 2007; Kalyuga, 2013) have noted that multimedia 
learning would be effective for learners with lower prior knowledge, the findings of this study differ from Navarro 
et al. (2015) who reported that students of lower academic levels dwelled more on AOIs while senior students 
only take a glance. The difference in the finding could be because the participants in this study were college 
students (aged 20-24 years) while those in prior studies (Grimley, 2007; Navarro et al., 2015) were primary school 
pupils (aged < 11 years). Also, in this study, better than junior-level participants, the senior-level participants 
might have been more willing to explore the annotated learning environment and found the signaled concepts 
more engaging which might have been responsible for dwelling on more signaled concepts. This result contributes 
to earlier studies, e.g., Castro-Alonso et al. (2019), on academic level-based differences in multimedia learning. 
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4.2.3 Years of Experience 


As shown in Figure 6, although no statistically significant difference was observed, overall, students with 0-2 
years of experience had higher dwell time on the AOIs. The students dwelled more on the AOIs (8 out of 11) than 
those with 2-5 years of experience. The students with 2-5 years of experience only dwelled longer on PPE, 
Situational Awareness and Secure of Materials. This result shows that the learners with lower experience would 
perceive better learning benefits with the annotated learning environment. This study contributes to prior studies 
e.g., Kalyuga et al. (2000) that have demonstrated differences based on prior experience in multimedia learning. 
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Fig. 6: Dwell Time for Annotated Video Based on Years of Experience 


4.2.4 Academic Program 


No statistically significant difference was observed in the comparison based on students’ academic program. On 
the overall, students in CEE dwelled longer on more AOIs than those in BC and CEM respectively (Figure 7). For 
six (6) of the AOIs namely, Situational Awareness, Exclusion Zone, Deep Excavation, Fall Protection, Use of 
Mobile Phone, Use of Ladder, the ascending order of the increase in dwell time on the AOIs is BC students, CEM 
students and CEE students. The variation observed based on academic programs could be due to differences in 
the emphasis of each program even though they are all aspects of construction education. For instance, Abudayyeh 
et al. (2000) pointed out that the CEE program is focused more on design of facilities; CEM program has 
concentration on achieving a balance between the engineering and management component of construction, while 
BC program has emphasis on management and business components of construction. Hence, the differences in 
the educational background of the students could have been responsible for the variations. For example, since 
CEE students are in a program that focused on design of infrastructural facilities, they might have been less 
familiar with the construction safety practices in the annotated video compared to their counterparts in CEM and 
BC programs. This could account for their higher dwell times on most of the AOIs. This result contributes to 
earlier studies e.g., Castro-Alonso et al. (2019) on differences based on academic program in multimedia learning. 
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Fig. 7: Dwell Time for Annotated Video Based on Academic Program 
4.2.5 Ethnicity 


Though without any significant difference, the result reveals that on the overall, White students dwelled longer 
on the annotated video (Figure 8). They also dwelled longer on more AOIs (6 out of 11) than students of other 
ethnicities. Studies comparing ethnic differences in multimedia learning are scarce. In this study, due to the small 
sample sizes, only White students who made up 60% of the participants were compared with other ethnicities. 
This study contributes to the few existing studies such as Moreno and Flowerday (2006) that have examined ethnic 
differences in multimedia learning. 
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Fig. 8: Dwell Time for Annotated Video Based on Ethnicity 


4.3 Cognitive Load 


As shown in Figure 9, the participants reported higher effort and frustration for the annotated video, however, 
they reported lower mental demand for the same. This shows that it was not mentally demanding to learn with the 
annotated video, but participants put in more effort and experienced higher frustration. This could be because of 
their unfamiliarity with the annotated video, the effect could be abated as learners get used to the video. Overall, 
the participants experienced a higher cognitive load in the annotated learning environment. No significant 
difference was observed between the annotated and unannotated video (p > 0.05). This result differs from Chiu et 
al. (2018) who reported that students who learned with annotated video experienced lower cognitive load. This 
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difference could be attributed to other factors such as participants’ ethnicity (Moreno & Flowerday, 2006), visual 
preference (Homer et al., 2008) and media type (Castro-Alonso et al., 2019) which are moderating variables in 
multimedia learning. For instance, Homer et al. (2008) reported that low visual-preference learners experienced 
higher cognitive load than high visual-preference learners in learning with video. 
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Fig. 9: Participants’ ratings of perceived mental demand, effort, and frustration after interaction with both video 


The demographic comparison of cognitive load in annotated video is shown in Figure 10. Although no statistically 
significant difference in any of the comparisons (p>0.05), female participants reported higher cognitive load 
compared to male participants. Similarly, participants with 0-2 years of experience self-reported higher cognitive 
load than their colleagues with 2-5 years of experience. For the academic programs, CEE students reported the 
highest cognitive load, followed by CEM students, while BC students had the lowest cognitive load rating. 
Demographic comparison reveals variations in the cognitive load level of the students. This variation could help 
instructors to design and adapt multimedia learning environments to suit learners of various categories. This is 
especially important to instructors especially those in engineering-construction educational programs, where 
effort is required to attract and retain female high school students and those from underrepresented groups (Choi 
et al., 2022). 
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Fig. 10: Demographic comparison of cognitive load in annotated video 


4.4 Verbal Feedback 


Most of the participants that watched the annotated video opined that the annotation was effective in making them 
learn easier. The participants reported that the annotation helped them to focus on learning content and their 
attention was held. This is because in addition to the audio narration, the annotation helped the student to easily 
identify specific areas to focus on in the video which helped them to better understand the learning contents. Only 
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three participants reported that the signals were distracting. The participants suggested using arrows instead of 
boundary boxes and highlighting their content because some learners might focus on the boundary boxes more 
than the contents within them. Some participants also suggested making the video more interactive and reducing 
the tempo. The students attested to the potential of the video for online courses and helping to learn about more 
construction practices without having to visit a job site. 


5. CONCLUSION, LIMITATIONS AND FUTURE WORK 


Characteristics of learners influence cognitive load and the efficacy of designs in multimedia learning. Hence, it 
is important to evaluate multimedia pedagogical tools to assess their suitability for intended context, purpose, and 
users. This study evaluates the efficacy of an annotated video which is a computer-vision-aided multimedia 
learning for teaching construction safety practices. The study reveals that computer vision generated signals were 
effective in drawing the intention of learners to fixate on AOIs. Within the annotated learning environment, female 
students, students with 0-2 years of experience, senior-level students, CEE students, and white students dwell 
longer and on more AOIs than their counterparts. However, these categories of participants reported higher overall 
cognitive load for the annotated video compared to their counterparts. The study also shows demographic 
differences in the cognitive load level of participants based on gender, ethnicity, academic level, years of 
experience and academic program. The results reveal that the demographic classes that dwelled more on the AOIs 
also reported a higher cognitive load. The results of this study could help instructors in engineering-construction 
education programs to effectively use annotated videos as instructional materials. This study could serve as a 
benchmark for future studies on artificial intelligence techniques for signaling in multimedia learning. The study 
opened a discussion on demographic differences in multimedia learning within the construction engineering 
education domain as well as the efficacy of artificial intelligence techniques in the design of multimedia 
pedagogical tools. This study has some limitations which could be the focus of future research. For example, only 
subjective rating of cognitive load was used, future research could combine both objective and subjective 
measures to assess cognitive load level in multimedia learning. Also, the small sample size of this study could 
have been responsible for some lack of significant differences in the comparisons made across the demographics 
of the participants. The small sample size also limits the generalizability of the findings. Future research could 
use higher sample sizes. Also, effects of age differences and academic levels (i.e., elementary school, high school, 
and college) of participants in multimedia learning could be the subject of future work. 
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ABSTRACT: In the field of AEC, Architecture Engineering and Construction, Building Information Modelling has 
increasingly assumed an important role, especially for construction simulation. BIM is needed for various building 
and management systems, particularly for project construction management. Students, teachers and operators of 
AEC need to have the availability of data, reports, pieces of information that allows to create BIM. "BENEDICT" 
is an European Erasmus project that has the aim of developing a web-based platform for BIM teaching that has a 
tight relationship with the AEC industry. Therefore, a BIM-enabled Learning Environment (BLE) can be used to 
implement BIM-based project planning and control system for learners and future practitioners, with Open 
Learning Resources. Open Learning Resources (OLR) are learning, teaching and research materials in any format 
and medium that are useful for teaching, learning, and assessing and for research purposes. In addition to the 
BLE platform, a BIM-model repository was developed to store information for each component of the project and 
of the learning activities. The repository can also store OLR and students’ outputs. The BLE repository has the 
task of helping students and practitioners to implement BIM actual project models by developing an on-line 
repository of digital models, objects and elements, therefore providing knowledge transfer between different 
players. 


KEYWORDS: Building Information Modelling, Benedict project, Open Learning Resources, Construction. 


1. INTRODUCTION 


The construction industry is one of the largest in the world economy, with about $ 10 trillion spent every year for 
construction related goods and services. However, the industry productivity has lagged behind that of other 
industrial sectors, such as manufacturing and retail, that have implemented digitization and innovation, increasing 
their productivity over time. This productivity gap has many causes, and Building Information Modelling (BIM) 
is considered one fundamental strategy to recover the desired level of performance (European Construction Sector 
Observatory, 2021). Building Information Modelling (BIM) is the use of a shared digital representation of a built 
object to facilitate design, construction and operation processes to form a reliable basis for decisions (ISO 
294811:2016). A built object can be a building, a road, a bridge, a process plant, everything that belongs to the 
built environment. A building construction information model is a shared digital representation of physical and 
functional characteristics of a built object (ISO TS 12911), therefore the term modelling addresses the process of 
managing information related to the facilities and project in order to coordinate multiple inputs and outputs, 
regardless of the specific implementation. Therefore, BIM is a method or strategy, not a tool. In the construction 
sector, knowledge transfer between different players, owner, designers, construction specialists, and project 
operators, together with project procurement take place by data exchange, i.e. information exchange. Among the 
specific features of the BIM methodology there is the ability to store information for each individual component 
of the project, including three dimensional properties and data concerning materials, building products, structure, 
quality performances, construction operations, transformation or installation stages, maintenance, time and cost 
data, sustainability and health and safety related information. Therefore, the fundamental element of this method 
is a digital model capable of n-dimensional representation of a building. BIM is considered a powerful approach 
to improve productivity in Architecture, Engineering and Construction sector. The use of BIM is spreading rapidly 
in many countries, covering a wide range of project both in the public and private sectors. Digitization of 
construction sector involves the need of helping students and practitioners to implement BIM actual project models 
by developing an on-line repository of digital models, objects and elements. Particularly focusing on educational 
processes, there is a strong need of developing a shared, online BIM models repository to provide an effective and 
coherent basis for BIM project implementation (Becerik-Gerber, 2012, Boeykens et al. , 2013, Clevenger et al. 
2013, Puolitaival, Forsythe, 2016) The BENEDICT project, BIM-Enabled Learning Environment for Digital 
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Construction, is an Erasmus+ strategic partnership between the Department of Civil Engineering and Architecture 
at Tallinn University of Technology (Estonia), the Civil Engineering Unit of Tampere University (Finland) and the 
Department of Architecture at the University of Bologna (Italy). The BENEDICT project deals with how to teach 
courses at university level with BIM tools, in particular through the use of an IT platform for BIM models (Olowa 
et al. 2022, Ruutman et al., 2022, Witt, Kahkonen, 2019). The fundamental needs of Real Estate and Construction 
professionals and students, Architects, Engineers, Construction Managers, concerning Building Information 
Modelling involve the design, development and implementation of various building and management systems, for 
instance: 


— Architectural systems and space coordination: i.e architectural layout and spatial units (size and 
coordination, proximity relationships, internal partitions), 

— Structural systems: i.e foundations, poles, structural slabs and basement structures, superstructure, 
reinforced concrete framework, GLT and solid timber frame, CLT and prefabricated panels, floors and 
roof structures; 

— Enclosure systems: i.e. architectural language and facades, doors and windows, architectural finishes, 
waterproofing, roofing; 

— Mechanical /Electrical / Plumbing - MEP systems: i.e. connection systems i.e. elevators, mobile staircases, 

— Construction project systems: i.e construction site provisions and equipment (e.g. scaffoldings, tower 
crane, formworks etc.). 

— Project Construction Management systems: i.e project control methods and tools concerning project 
description, integration and implementation, project planning and time management, project risk 


management, project cost, quality and resources management. 


The needs of BLE users — learners, teachers, system administrators — consist in having the availability of data, 
reports, pieces of information concerning architecture — engineering systems. The technical data and information 
concerning design, development and installation of the building and its project management allows BLE users to 
create the Building Information Model. For example, construction management students will need a set of case 
studies to be tested with practical exercises and the Open Learning Resources will be supplied as actual case studies 
- each case study consisting of a building or facility that has been designed and engineered in industry or in previous 
courses. Learning experiences using these will greatly enhance BIM-enabled learning where BIM-based 
workflows will provide immersive learning and training opportunities. BIM — enabled learning can use a virtual 
platform, a web site and repository, where all BIM models, examples and data can be stored and used. This creates 
a BIM enabled Learning Environment, BLE. The BLE provides the learning environment or web platform 
specifically designed to support this type of learning. Key resources for the use of the BLE are Open Learning 
Resources. 


2. OPEN LEARNING RESOURCES AND VIRTUAL DATA ENVIRONMENTS 


The simulation of actual design and project management activities that takes place in teaching AEC modules with 
BIM as a media has the need of a common data environment. A Common Data Environment CDE is a single 
source of information for any given project, used to collect, manage, and disseminate all relevant approved project 
documents for multidisciplinary teams in a managed process (BS EN ISO 19650). A CDE has four different 
environments where models and data can be stored: the work in progress area, the shared area, the published area 
and the archive. With the aim of creating a virtual environment for learning and teaching activities two different 
virtual environment were developed, the BLE platform and the OLR repository. The OLR repository is not a CDE 
because does not fulfill the requirements of ISO 19650, but was developed with the aim of storing BIM models 
and data. The BLE platform and repository create a virtual environment where teachers, learners and system 
administrators can store data, reports, pieces of information concerning architecture and engineering systems of 
the built object under design. All of these technical data and information concerning the different stages of 
production of a building, design, i.e. concept design, space coordination and technical design, construction and 
installation, operation and maintenance allows user to create the Building Information Model. Construction 
Management students, as an example, will need to use a set of case studies to be tested with practical exercises. 
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Open Learning Resources (OLR) or Open Educational Resources (OER) will be supplied as actual case studies — 
each case consisting of a building or a facility that has been designed and engineered in industry or in previous 
courses. The BLE, will be used to store and manage both OLR and BIM models, output of the students’ work. 
Therefore, the BIM — Enabled learning environment will provide a virtual environment where educational 
activities in the AEC sector can be performed using BIM-based technology. Open Educational Resources (OER) 
are learning, teaching and research materials in any format and medium that reside in the public domain or are 
under copyright that have been released under an open license, that permit no-cost access, re-use, re-purpose, 
adaptation and redistribution by others (UNESCO, 2019). Open educational resources (OER) are freely accessible, 
openly licensed instructional materials such as text, media, and other digital assets that are useful for teaching, 
learning, and assessing, as well as for research purposes. The term "OER" describes publicly accessible materials 
and resources for any user to use, re-mix, improve, and redistribute under some licenses. These are designed to 
reduce accessibility barriers by implementing best practices in teaching and to be adapted for local unique contexts. 
The BENEDICT project has the aim of promoting a new concept of learning/training in the REC sector. The Open 
Learning Resources are essential for users to benefit from the BLE as they provide real (or near-real) project data 
for learners to work with and this will demonstrate the practical implementation of BIM workflows. The BIM- 
enabled learning environment creates a repository of OLR that can be descriptions of projects, technical BIM 
models, and project plans (table 1). 


Table 1: Type of Open Learning Resources. 


Descriptions of projects project objectives; site description and analysis; | .docx; .xlsx; .pdf; .dwg; dxf; 


media concerning the site; building overall | xml; mp4; JPG; (...) 
concept description; statement of work (SOW); 


building systems reports, drawings and 


calculation 
Technical BIM models BIM objects; BIM model ifc 
Project Plans architecture and envelope layout; structure | docx; .xlsx; .pdf, .dwg; dxf; 


layout; MEP systems layout, construction | xml; mp4; JPG; (...) 
process. bills of quantities; budgets; schedules; 
resource estimation, procurement 


documentation concerning materials, products, 


components and other supplies; safety plans 


Open learning resources for BLE need to be checked before model processing. BIM models should be checked 
also concerning the achievement of the desired level of detail / level of development (LOD) and quality assessment 
consisting in code checking and model checking. The purpose of defining the level of information need is to 
prevent delivery of too much or too little information (ISO 19650-1:2018). In particular, the project information 
requirements (PIR), in relation to the delivery of an asset indicate for what, when, how and for whom information 
is to be produced. The Level of Information Need (LOIN) has to be set by applying the BS EN 17412-1 that 
indicates the framework to set the LOIN. Firstly, four pre-requirements addressing the context needed to identify 
the information content have to be set: BIM uses, milestone, actors, object. After this stage, the level of information 
need must be set concerning geometrical information, alphanumerical information, and documentation (BS 
EN17412-1:2020) (figure 1, figure 2). In the specific case of construction management — oriented applications, 
Open Learning Resources will be supplied to students and applicants as actual case studies. Each case study 
consists in one or more than one building or civil engineering facility that has been designed and engineered in 
previous courses of the university programme, or provided by teachers or by the BENEDICT project associated 
partners. As an example, the following documentation / information can be produced by the students of 
construction engineering and management courses with Building Information Modeling. 


— Project Planning, job site design & safety planning; 
— Work Breakdown Structure; 
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— Construction project schedule; 
— Construction site design; 
— 4D BIM animation. 


Fig. 1: Relationship diagram on level of information need. 


|___ LOD 100 LOD 200 LOD 300/350 LOD 400 LOD 500 

| | | 

| Concept phase Conceptual Tender project Construction As built project 
| design state project 


Fig. 2: Example of the concept of "continuum" associated to the detail of a door. 


3. BIM ENABLED LEARNING ENVIRONMENT AND OLR CATEGORIZATION 


The BLE (BIM enabled learning environment) virtual environment has the task of integrating BIM strategy and 
technologies into curricular activities, i.e. course modules. The BLE environment consists of the BLE Platform, 
that hosts pilot modules OLR and a repository that includes a Content Management System and a server that hosts 
BIM models and other OLR (fig. 3). The pilot modules section addresses the different pilot modules of the 
BENEDICT project: integrated design module, risk management module and time management module (fig 4). 
The repository includes a Content Management System CMS and a Data base DB for storage of OLR and 
students’outputs, (fig. 5). Both sections can be used by different actors, with different navigation capacities, 
depending on the type of user, teacher, learner, and system administrator (fig. 3). 


The navigation capacity is of capital importance as depends on data and BIM object categorization. BIM models 
can be classified as types of models and model elements. All models are composed of model elements that have 
properties and attributes. Each native BIM authoring tool, as well as IFC, uses its own unique terminology to 
describe these components. It is therefore important to first understand what is considered an element and how 
elements relate to one another in order to discuss them. 
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Fig. 3: BIM-enabled Learning Environment (BLE) — system architecture. 
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Fig. 4: BIM-enabled Learning Environment (BLE) — BLE platform. 
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Fig. 5: BIM-enabled Learning Environment (BLE) — OLR repository. 


258 


SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


Due to the complexity of buildings and BIMs, a simple hierarchy does not suffice to describe the relationship 
between model elements (US GSA BIM Guide 07). A sophisticated ontology is required to develop an 
understanding of how model elements may relate to one another. All the levels in the model ontology have 
properties associated with them, and thus the properties of one model element are associated with related model 
elements. A BIM ontology is an informal, semi-structured, conceptual domain ontology used for knowledge 
acquisition and communication between people. 


Fig. 6: Federated Model (from US GSA BIM guide 07). 


A federated building information model is an assembly of distinct discipline models to create a single, complete 
model of the building. A federated model is a model composed of multiple linked models that contains architectural, 
structural, and mechanical, electrical, and plumbing (MEP) information of a building (US GSA BIM guide 07). 
Federation is the creation of a composite information model from separate information containers (ISO 19650 -1). 
A stand-alone model is a single discipline model, an information model that is a set of structured and unstructured 
information containers (ISO 19650 -1). The Association of General Contractor of America, AGC, in the AGC 
Consensus Docs 301- BIM Addendum (AGC, 2015) defines a federated model as a model consisting of linked but 
distinct component models, drawings derived from the models, texts, and other data sources that do not lose their 
identity or integrity by being so linked, so that a change to one component model in a federated model does not 
create a change in another component model in that federated model. A single federated model is useful for design 
co-ordination, clash avoidance and clash detection, approvals processes, design development, estimating and so 
on, but the individual models do not interact, they have clear authorship and remain separate. This means that the 
liabilities of the originators of the separate models are not changed by their incorporation into the federated model 
(fig. 6, fig. 7). 


Fig. 7: Single discipline model- stand alone MEP Model of the Building (from US GSA BIM guide 07) 


Categorization is of capital importance to achieve effective information management. Classification can be defined 
as: 'The act or process of dividing things into groups according to their type. Uniclass is based on the general 
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structure described in ISO 12006, which promoted the use of classification classes, each of which relates to a 
classification need. As well as products (or objects), some of the other classes suggested by ISO 12006 are: 

— Entity e.g. a building, a bridge, a tunnel; 

— Complex (a group of entities) e.g. airports, hospitals, universities, power stations; 

— Space e.g. office, canteen, parking area, operating theatre; 

— Product e.g. boiler, door, drain pipe; 

— Facilities, this combines the space with an activity which can be carried out there, for example, an 
operating theatre; 

— Other classes can be added, such as 'system', which works very well in an MEP environment. Similarly, 
an 'activities' class would be very helpful for defining a range of activities which might be able to be done 
within a particular space, as an alternative to using the 'facilities' class. 

The organization of information about construction works is of capital importance for Building Information 
Modelling, therefore a framework for classification is proposed by ISO 12006 standard as showed in the following 
tables (table 2). Information are relevant to particular stages in a building construction project, therefore, life cycle 
stages should be defined on a common basis. Building life cycle stages proposed by ISO standards are the 


following: inception; brief; design; production; maintenance and demolition. These principal stages are further 
decomposed to provide a meaningful set of stages for exchange requirements. 


Table 2: Standard principal and decomposed life cycle stages (ISO 12006-2:2015). 


Life cycle stage Principal life cycle stage Decomposed life cycle stage 
Pre-life cycle stages Inception Portfolio requrements 
Brief Conception of need 
Outline feasibility 


Substantive feasibility 


Pre-construction stages Design Outline conceptual design 


Full conceptual design 


Coordinated design and procurement 


Construction stages Production Production information 
Construction 
Post-construction stages Maintenance Operation and maintenance 
Demolition Disposal 


Different classes of information are proposed by ISO 12006 standard, related to resources, as construction 
information, products, agents and aids; or relatated to process as management and construction process; related to 
result as construction complex, entity, built space, element and work result; or related to property (table 3). 
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Table 3: Framework for classes of information about construction works (ISO 12006-2:2015). 


Class 


Classes related to resource 


Classified by 


Construction information 


Content 


Construction product 


Function or form or material or any combination of these 


Construction agent 


Discipline or role or any combination of these 


Construction aid 


Classes related to process 


Function or form or material or any combination of these 


Management 


Management activity 


Construction Process 


Construction activity or construction process life cycle stage or any 


combination of these 


Classes related to result 


Construction complex 


Form or function or user activity or any combination of these 


Construction entity 


Form or function or user activity or any combination of these 


Built space 


Construction element 


Form or function or user activity or any combination of these 


Form or function or user activity or any combination of these 


Work result 


Form or function or user activity or any combination of these 


Classes related to property 


Construction property 


Property type 


Table 4: Some examples of BIM classification. 


BIM oriented classification 


| BIM community Classification system 


Uniclass 2015 
OmniClass 
MasterFormat® 
UniFormat™ 
CoClass 

CCS 

TALO 2000 


NS 3451 & TFM 


Industry Foundation Classes 
buildingSMART Data Dictionary 


ETIM 


Language 


Type 
Project 
o 


1e) 
O 


Implementation 
Research 
Collaborative initiative 
Other 


Category: 


fe) 


O 
oO 
O 


3D — Virtual Design & Construction 
Lean & industrialized construction 
Planning and budgeting 
Subcategory: 
= Strategies 
Edification 
Project 
Workflows 


The framework for classification of ISO 12006 about construction works also introduces a set of different 
relationships between the different classes of information. The organization model or user activity of the built asset 
uses the built space that is defined by a construction result, that is part of a construction complex. A construction 
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complex is an aggregate of construction entities, composed by construction elements. A construction results is 
developed by a construction process that is divided in pre-design, design, production and maintenance processes. 
Construction process uses construction resources that can be construction product, construction aid, construction 
agent and construction information (ISO 12006-2:2015). Classifying data means structuring it in an agreed way 
so that different actors can easily find what they need and understand it. A classification system is like a common 
language. In BIM, classification lets people, software and machines share and use building information efficiently 
and accurately. Different classification systems have been developed for different types of BIM data and actors, 
and for different geographic areas and situations. In table 4 some other examples of BIM classification are 
presented. 


Table 5: Metadata of BIM education Models. 


Text English, Finnish, Estonian, Italian The language(s) used in the model to describe 
the content 
Text Office, Teaching, Care, Residential Property used to describe the dominant 


function/use case for the facility 
Text Urban, Architecture, Landscape, Interior Design, The model discipline prepared by or for the 
Structural Engineering, Building Services Engineering purpose of the given discipline. 


(HVAC and MEP), Construction Engineering, Facility 


Maintenance 
Text Small, Medium, Large Reflecting on the size of the building, relative 
to its building type. 
Text Mass, Room/Space/Zone, and Element models The type of model content 
Text Strategic Planning, Brief, Programming, Schematic The stage of the model prepares in or for 
Design, Preliminary Design, Design Development, 
Detailed Design, Pre-Construction, Construction, 
Commissioning, Hand-Over, Use, Renovation, 
Disassembly, Demolition 
Text Gather, Generate, Analyze, Communicate, Realize Penn state classification for BIM uses 
Text Initial, Defined, Managed, Integrated, Optimized The mature of the model in any specific 
stage. 
Text Symbolic, Generic, Detailed, Fabrication Average accuracy of geometry in the model. 
Text Preliminary, Proposed, Coordinated, As-Built The state of the information in the model, its 


reliability with respect to itself and others in 
the process 


Text CCI, Uniclass, Masterformat 
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As a first approach the following classification systems for Open learning resources OLR were proposed for the 
BLE platform: metadata, building type, size of the project, different plans, life cycle period, model categories, 
model functions, language/country. Metadata classification was chosen as the easiest way of OLR categorization. 
Many metadata of BIM models can be detected, and different categories of information can be listed in the 
repository for each piece of OLR. Again, a list of metadata of BIM education models is presented in table 5. 


The BIM-enabled learning environment is a prototype for online BIM models repository (fig. 5; fig.8). The 
proposed categorization system of BIM models is based upon five categories: discipline, type of building project, 
life cycle stage, model use and BIM dimension (fig. 9). 


UniBO DB AlContents Examples Components Alte Rewits Heme Q 


a welcome 


| = re | 


Fig. 8: BENEDICT DB — Unibo server and project data repository. 


A prototype for Unibo server that host the CMS and the OLR database was developed for the proposed online BIM 
models repository of the BENEDICT project (http://ble.unibo.it). In the welcome page (fig. 8) it is possible to 
download a guideline to help end users better use the platform . From the home page, end users also could access 
several sub-pages including “Examples”, fully solved BIM solutions that students can use as examples, 
“Components” or BIM objects, “Aid” including BIM documentation, standards, project data, and “Results” where 
students’ outputs are stored. The repository also provides a powerful searching engine to help quickly find useful 
information from the repository. 


4. CONCLUSIONS 


In conclusion, Building Information Modelling (BIM) has become increasingly important in the field of 
Architecture Engineering and Construction (AEC), particularly for construction simulation and project 
construction management. The availability of data, reports, and information is crucial for students, teachers, and 
operators in the AEC industry to create BIM models. The BENEDICT project, a European Erasmus plus KA2 
project, aims to develop a web-based platform for BIM teaching that is closely connected to the AEC industry. 
This platform, known as the BIM-enabled Learning Environment (BLE), provides a repository for BIM models, 
open learning resources (OLR), and students’ outputs that includes a Content Manaagment System CMS and a 
Data Base. The CMS and the DB are freely accessible to registered users that can access OLR are essential for 
BIM-enabled learning processes and provide real-life project data for learners to work with. The BLE platform 
categorizes BIM models and elements, allowing for effective information management and knowledge transfer 
between different players in the AEC industry. By incorporating OLR and BIM workflows, the BLE platform 
enhances learning experiences and supports the implementation of BIM-based project planning and control. 
Ultimately, the BENEDICT project and the BLE platform contribute to bridging the productivity gap in the 
construction industry by promoting the use of BIM and providing a collaborative learning environment for students 
and future practitioners. 
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Fig. 9: BIM-enabled Learning Environment (BLE) categorization. 
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ABSTRACT: In recent years, technology has been playing a transformative role in the field of built environment, 
architecture, and construction education. It can be argued that the emergence of digital technologies has 
revolutionised the approach to teaching and learning in higher education in these fields. Digital technologies, 
such as Artificial intelligence (AI), additive manufacturing, robotics, 3D laser scanners, and Immersive Realities 
(IR), have played a crucial role in enhancing sustainability and efficiency in the industry. However, the 
opportunities provided by the use of these technologies (as a single tool or combined) in higher education and 
within the field of Architecture, Engineering, and Construction (AEC) are still relatively unexplored. To address 
this gap, this work presents a novel pedagogical framework aimed to enhance students’ literacy on emerging 
technologies, and increase their criticality, and understanding of professional practices along with the related 
ethical challenges. Furthermore, to assess its effectiveness regarding the integration of immersive VR technologies 
in the teaching practice, a learner-centred evaluation approach is proposed, based on the collection and 
correlation of both qualitative and quantitative data. Concerning the former, a dedicated questionnaire is 
developed to collect students’ subjective feedback. For the latter, a method for tracking their use of space in the 
virtual environment is discussed. Both the immersive pedagogical framework and evaluation approach presented 
in this work will be implemented in diverse architecture and civil engineering master classes in Australia and in 
Italy, and their comparative outcomes and validation will be the object of future joint contributions. 


KEYWORDS: Digital and Immersive Pedagogy, Digital Technology; Architectural Higher Education; Immersive 
Reality, Evaluation, Questionnaire, Spatial tracking. 


1. INTRODUCTION 


Over the past decades, there has been a rapid growth in urbanisation and development of urban areas across the 
world. This has resulted in a considerable increase in the demand for new buildings, structures and infrastructures, 
which, in turn, brings along a number of environmental, social and economic challenges. The nature of such 
challenges has become increasingly complex, so much so, that the traditional and conventional methods of 
construction cannot address them. Emerging technologies and digital tools have been increasingly applied in this 
field to address such challenges and achieve a more sustainable, safe, efficient and optimized practice (Alsafouri 
& Ayer, 2018; Alsafouri & Ayer, 2019; Ardito et al., 2019; Davila Delgado, Oyedele, Beach, et al., 2020; Davila 
Delgado, Oyedele, Demian, et al., 2020; Fazel & Izadi, 2018; Hajirasouli & Banihashemi, 2022; Hajirasouli et al., 
2022; Hamzeh et al., 2019; Mandolla et al., 2019; Moon et al., 2015; Nafors et al., 2020; Rohani et al., 2014; 
Valero et al., 2015). Among such technologies, Immersive Realities (IR) such as Virtual Reality (VR) and 
Augmented Reality (AR) have proven to be very advantageous in various areas of built environment. When 
considering architecture discipline and the required spatial qualities, capabilities and understanding required for 
it, VR seems to be a more appropriate tool to incorporate in its pedagogy, with multiple advantages (Getuli et al., 
2020; Hajirasouli & Banihashemi, 2022; Hajirasouli et al., 2022; Rahimian et al., 2019). Despite the emphasis that 
have been made by a number of studies and scholars regarding the development of digitally enhanced and 
technology-integrated teaching methods (Aydin & Aktaş, 2020; Bashabsheh et al., 2019; Ceylan, 2021; Hajirasouli 
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& Banihashemi, 2022; Shirazi & Behzadan, 2015), a number of studies conducted by authors have identified that 
the current teaching and learning practices in the AEC higher education does not adequately embrace and 
incorporate such digitally enhanced methods and, therefore, are yet to respond to the industry’s demand in this 
area (Getuli et al., 2020; Hajirasouli & Banihashemi, 2022; Hajirasouli et al., 2022; Pour Rahimian et al., 2019). 
More importantly, when developing such pedagogies, how would this affect students' perception of their education 
and impact their teaching and learning experience. 


This study aims to provide an in-depth understanding of student's needs and requirements and, more importantly, 
their perception of the integration of new technologies in their courses and curriculum. For this purpose, in Section 
2, a theoretical framework was developed to establish a novel immersive pedagogy. Building on the constructivist 
assumption of active learning for a digitally enabled pedagogy, a problem-based learning process fostered by 
immersive technologies is conceptualized. Furthermore, to assess the framework's effectiveness, an original mixed 
method for the evaluation of VR-based teaching experiences for AEC students is discussed in Section 3. This 
comprises a dedicated questionnaire to be administered to the students after the immersive learning experiences to 
collect their subjective perceptions and feedback. Complementarily, the tracking of their virtual position and its 
restitution in the form of contextualized heatmaps is proposed to objectively evaluate their use of the virtual 
environment in relation to the learning objectives. In Section 4, the implementation plan of the proposed 
pedagogical framework and evaluation method is then presented with reference to the architecture and civil 
engineering master classes which will be involved both in Australia (Western Sydney University) and Italy 
(University of Florence). Eventually, the discussion of the limitations and outlook of this study is provided in 
Section 5. 


2. PEDAGOGICAL FRAMEWORK 


To develop the proposed theoretical model a critical literature review was conducted, including the relevant works 
related to previously developed models in this area. From their analysis, constructivism philosophy emerges as the 
most appropriate teaching philosophy to be adapted to this framework, due to the constructive nature of immersive 
technologies and their implementation in teaching and learning activities. In fact, constructivism philosophy is 
correlated with creating and constructing new knowledge based on the learner’s already existing knowledge, 
therefore implying an active and continuous participation in the process of learning (Behzadan et al., 2015; 
Behzadan et al., 2011; Biggs & Tang, 2007; Bruning et al., 1999; Lord, 1999; Luo & Mojica Cabico, 2018; Tynjala, 
1999; Von Glasersfeld, 1995). Hence, the constructivist approach was used as the principal philosophy for the 
developed model. 


Choosing the right approach for implementing this model was the next step of this work. It is suggested that the 
concept of digital pedagogy does not only reflect upon using digital tools and technologies, rather, it is also about 
cautiously considering their effects and implications from a critical pedagogical point of view. Therefore, the 
decision about their integration within the teaching approach or not, depends on the desired learning outcome of a 
course or subject (Anderson, 2020; Barber et al., 2015; Croxall, 2013; James & Pollard, 2011). Problem-based 
learning approach was also used in this model as an integral part of the application and integration of immersive 
technologies. The selection of this approach was also due to its suitability for complex real-world situations where 
there is no right or wrong answer to the problem (Barber et al., 2015; Savin-Baden, 2007; Word, 2003), which is 
the main focus of AEC discipline and industry. This approach helps students to work collaboratively in groups, to 
identify the problems and gaps, and to develop solutions and knowledge, through self-directed processes (Barber 
et al., 2015; Savin-Baden, 2007; Word, 2003). 


Eventually, immersive learning was also used as the last stage of this model. Using this method, the creation and 
construction of knowledge occurs through virtual immersion into a context, dialogue and/or situation. Immersion 
occurs in two different ways: immersion through narrative (cognitive aspects), and immersion through 
technological devices (technical aspects). This study focuses on immersion through technical devices, hence 
requires various tools and technologies, such as AR, VR, and Virtual Learning Environment (VLE). The choice 
of tools and technologies in this model depends upon the level of immersion required for a learning objective. VR, 
which is the subject of this study, is mainly being used when a fully immersive experience and a sense of presence 
in the virtual environment is required for the learning process. 


3. IMMERSIVE VR LEARNING EXPERIENCE EVALUATION 


The proposed pedagogical approach theorises the beneficial impact that the adoption of immersive visualization 
technologies can provide in AEC master classes. To support this claim and assess its effectiveness in upcoming 
case study implementations (see Section 4), an original method for the evaluation of immersive VR learning 
experiences is developed, considering both qualitative subjective data and quantitative objective observations. As 
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shown in Figure 2, a questionnaire to be administered to the student after the immersive experience and divided 
into five inquiry areas is provided. Besides, the student’s use and understanding of the virtual environment in 
relation to the learning objectives is evaluated with the acquisition of their spatial track during the VR experience 
and its sequential visualization of a BIM environment in the form of a heatmap. 


Constructivism 


Learning 


Digital Learning 


Making a decision not to 
use digital technologies 


Immersive Learning 


Immersion using 


narrative 
$ ł 
Virtual Learning Augmented Reality 
Environment 


Fig. 1: Pedagogical framework designed for BIM-enabled VR-based technology application into architectural 
design studio. 
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f 210 & Demographics (also for students’ demographics) 
Evaluation of students’ satisfaction 
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y and “mation sickness” symptoms 


Evaluation of the effectiveness of 


Virtual Contents Effectiveness the contents virtual representation 


Qualitative Data 


(Questionnaire) uation lectivn 
Learning Experience Effectiveness ——*> Eval of the off oes of 
the immersive learning oxperionce 


, Evaluation of perceived potential 
benefit of new features 


LŠ suggestions (Open) Collection of students’ personal 
99 impressions and ideas 


i Analysis of the students’ path in 
Sanep Teac the immersive virtual environment 


» Future Developments - 


Quantitative Data 


(Spatial tracking) Student Temporal Use of Space Analysis of the time and attention 
(Heatmap) spent in different virtual locations 


Fig. 2: Immersive VR learning experience evaluation approach — Data schema. 
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3.1 Post-experience evaluation questionnaire 


To collect students’ subjective feedback, a dedicated questionnaire has been developed to be administered directly 
after the immersive learning experience. First, data necessary to classify, analyse and classify the questionnaire 
responses based on students’ demographics are collected. Then the experience is evaluated against the student’s 
perceptions, concerning: the engagement with the virtual environment (immersivity); the reproduction of the virtual 
contents (contents effectiveness, e.g., graphical realism compared to the objective of the session); the gained self- 
efficacy and competency (learning experience effectiveness); the opinion on suggested additional features that 
could be implemented in the experience and that were currently missing (future developments). For the cited 
criteria the responses are collected according to a scoring system ranging from one to five, where five correspond 
to the maximum agreement with the statement or satisfaction with the experience. Furthermore, an open textual 
field (suggestions) is provided to enable the collection of personal remarks, impressions, and ideas, involving and 
empowering the student in the improvement and evolution of a learning approach based on ever-changing 
technologies and that will need to keep up to learners’ expectations and needs to be future proof. In Table 1 the 
data type and the description and purpose of data collection are reported along with explanatory examples. 


Table 1: Questionnaire data type description and prompt example 


Description Example 


Data type 


Questionnaire ID: (Text) 


Data necessary to analyse and classify the responses collected through the Immersive experience: (Text) 


ID & demographics 


questionnaires with reference to the occurred immersive learning 
experiences, and to cluster their results also with reference to the 


students’ demographics (anonymized). [Various] 


Date: (yyyy/mm/dd) 

Age: (Number) 

Gender: (Multiple options) 
Course: (Text) 


Questions related to the student’s ability to get immersed in the virtual 


experience. They are useful for assessing how engaged the students were 


How much did you like the 


Immersivity in the virtual world and to evaluate their satisfaction in terms of both ease experience? 
of use and comfort (also related to the possible onset of symptoms of [min 1; max 5] 
"motion sickness"). [Single rating — Likert scale] 
T a ; . a How efficiently and clearly does the 
Questions regarding the virtual representation of the learning contents, ; 
Contents ; p . A i content of VR help you to perceive 
i aimed at evaluating the effectiveness of the experience against the : k 
effectiveness the discussed subject better? 


objectives of the session. [Single rating — Likert scale] 


[min 1; max 5] 


Learning experience 


Questions pertaining to the overall effectiveness of the experience, aimed 


at investigating the actual usefulness of the immersive VR learning 


Do you think this experience is 


useful in understanding the qualities 


effectiveness session compared to traditional methods, especially concerning the of the designed spaces? 
learning objectives. [Single rating — Likert scale] [min 1; max 5] 
Questions concerning the introduction of new features (e.g., content Would you like to be able to grasp 
Future animations, audio and visual effects, etc.) or virtual content aimed at and interact with objects in the VR 
developments enhancing the immersive VR learning experience through the inclusion of | environment? 
greater realism and/or interactivity. [Single rating — Likert scale] [min 1; max 5] 
; Open questions to collect personal impressions and ideas from the : 
Suggestions Suggestions 


students. [Text box] 


3.2 Student’s spatial tracking and use of space visualization 


In the previous paragraph, it has been discussed how the students are actively involved in providing data for a 
qualitative evaluation. Here, the second, complimentary, data acquisition method is presented which is based on 
the objective observation of the virtual positions covered by the students throughout the experience. In turn, this 
involves capturing the position of the student in the VR environment during the entire course of the simulation in 
order to analyse their actual use of the tridimensional space. For this purpose, the student's virtual position shall 
be recorded with a data acquisition rate of at least 1 Hz. The 3D point sequence resulting from this process shall 
be transferred in a BIM environment and converted for the generation of heatmap visualization, within which the 
position of the student is represented, weighted by the time spent, with a colour gradient. In this way, the relevance 
of different areas of the experience can be evaluated based on the time spent in certain virtual locations by the 
students. As with the questionnaire’s development, the student spatial tracking and visualization procedure to be 
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performed during the learning experience is explained with reference to the collected data type in Table 2. 


Table 2: Student’s spatial tracking and use of space visualization characteristics 


Data type Description Example 


ID & demographics See Table 1. See Table 1. 


The duration of the experience from start to finish, excluding a possible 


: ? tutorial or time required to the student to get used with the immersive VR 
Learning experience 


F: system and controllers. This is necessary to allow for the later heatmap Duration: (number) sec 
uration 
visualization of the student use of the virtual space weighted on the 
overall elapsed time. [time in seconds] 
The student position in the virtual environment is collected as 3D point . f 
J 4 i i . Colour gradient representation of 
. with 1 Hz frequency. The corresponding spatial track is then graphically 
Student spatio- 5 i $ the student’s followed path 
represented against a model of the experienced environment (e.g., BIM : f 
temporal track (weighted by time) 


model) both as a 3D path (polyline) and with an heatmap representation, : 
: . ‘ : i i i [min. green, max. purple] 
with the colour gradient weighted on the time spent in a certain location. 


4. FRAMEWORK IMPLEMENTATION PLAN 


The theoretical approach of this paper is built upon previous research projects, undertaken by authors in Australia 
and Italy, to create an innovative approach and prototype protocol for the design, delivery and evaluation of a 
number of subjects for AEC students in higher education, based on an interactive and immersive learner-centred 
approach. The outcome of this work will be implemented in a number of course subjects at the University of 
Florence, Italy and Western Sydney University, Australia. The nominated subjects for the implementation of this 
model are reported in Table 3. 


Table 3: Immersive VR learning approach for AEC higher education - Implementation plan 


Institution Course (Academic Year 2023/2024) Expected attendees 
Western Sydney eo 
: . i e Advanced Design Communication (ARCH7007) 50 
University, Australia > Acne 
e ARCH7015 Practice Research Studio Civic (ARCH7015) 50 
University of Florence, ° BIM and Information Modeling of the Construction Process (B028836) 50 
Italy e Design and Safety of Workplaces B030584 (B063) 50 


The implementation of the proposed pedagogical model will engage the students in providing indications and 
opinions regarding the environment in which they are exploring and studying. This approach will help with 
validating both the pedagogical framework, as well as, the designed environments, by gathering the user’s 
experiences and observations. This, in turn, will assist in enhancing the entire framework developed in this study. 
For this purpose, and according to the principles discussed above, a prototypical questionnaire implementation 
comprising 16 questions has been developed and is represented in Figure 3. 


In addition to the questionnaire, a prototypical implementation of the student spatial tracking restitution in form 
of heatmap is proposed in Figure 4. As it can be seen, the more relevant areas of the virtual environment for 
learning purpose could straightforwardly be inferred based on the time spent, providing useful information in the 
development of further immersive teaching material (e.g., virtual environments). 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


Student Data 


Questionnaire in - max 


1) Did you enjoy the experience? 1 
Immersivity 2) Were you comfortable during the experience? 1 
3) Was it casy to use the viewer and controller? 1 
4) Did it bother you not being able to see your body in the virtual world? 1 
5) Are the signal arrows helpful in figuring out where to go? 1 
6) Does the virtual building spaces sufficiently replicate the real one’? 1 
7) Did you need less time to understand the content displayed in VR compared to a face-to- 
Contents’ face power-point presentation? 
Effectiveness ) 8) Were your learning skills affected by the VR experience? 1 
9) Do you think this experience can replace the role of the educator? 1 
10) Do you think this experience is useful in getting to know a subject before it is explained 
to you by an educator ? 
11) Do you think this experience is useful in understanding the building and its technological 
Learning aspects? 
Experience's 12) Is this type of learning useful for an inexperienced student? 
Effectiveness 13) Is this type of learning useful for a student without any knowledge on the subject? 
14) Do you preter to interact with an avatar or a teacher? 
Future { 15) Would you like to listen audio explanations of the virtual scenarios? 
Development 16) Would you like to be able to interact with virtual objects? 


wv PAAann nun 


NYNNE NNNŞN N 
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Suggestions 


Fig. 3: Immersive VR learning session evaluation questionnaire — Implementation prototype. 


Site Scenario 
Plan 


Space Usage 
Gradient 


Fig. 4: Student spatial track heatmap representation — Implementation prototype 
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5. CONCLUSIONS 


This study, which was a successful collaboration between University of Florence and Western Sydney University 
aimed to provide a comprehensive understanding of student’s perception and demand of the newly developed 
pedagogical framework. This framework aimed to create a more engaging learning environment, while responding 
to the industries needs and requirements, and prepare students for the future of their careers in AEC. To achieve 
this, this study was developed in two stages. In stage one, a qualitative approach was used to develop a novel 
pedagogical framework, based on the constructivism method, followed by a problem-based approach and 
immersive learning method. To test the effectiveness of this developed model and student’s perception of it, a 
mixed-method approach was used to develop an in-depth questionnaire. Furthermore, a contextualized heatmap 
recording is accompanying the developed survey to ensure the robustness of the results of this study. 


The outcome of this research will be tested internationally at the architecture and civil engineering master classes 
which will be involved both in Australia (Western Sydney University) and Italy (University of Florence) in early 
2024. The outcome of the experiment will form another joint publication and collaboration between the two 
institutions. 
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ABSTRACT: As mixed-reality (XR) technology becomes more available, virtually simulated training scenarios 
have shown great potential in enhancing training effectiveness. Realistic virtual representation plays a crucial role 
in creating immersive experiences that closely mimic real-world scenarios. With reference to previous 
methodological developments in the creation of information-rich digital reconstructions, this paper proposes a 
framework encompassing key components of the 3D scanning pipeline. While 3D scanning techniques have 
advanced significantly, several challenges persist in the field. These challenges include data acquisition, noise 
reduction, mesh and texture optimisation, and separation of components for independent interaction. These 
complexities necessitate the search for an optimised framework that addresses these challenges and provides 
practical solutions for creating realistic virtual representations in immersive training environments. The following 
exploration acknowledges and addresses challenges presented by the photogrammetry and laser-scanning pipeline, 
seeking to prepare scanned assets for real-time virtual simulation in a games-engine. This methodology employs 
both a camera and handheld laser-scanner for accurate data acquisition. Reality Capture is used to combine the 
geometric data and surface detail of the equipment. To clean the scanned asset, Blender is used for mesh retopology 
and reprojection of scanned textures, and attention given to correct lighting details and normal mapping, thus 
preparing the equipment to be interacted with by Virtual Reality (VR) users within Unreal Engine. By combining 
these elements, the proposed framework enables realistic representation of industrial equipment for the creation 
of training scenarios that closely resemble real-world contexts. 


KEYWORDS: Digital twin; 3D reconstruction; Virtual reality; Laser scanning; Photogrammetry, Training 
simulation; Unreal Engine. 


1. INTRODUCTION 


In recent years, the increased availability of mixed-reality (XR) technology has spurred the exploration of virtual 
reality training environments, which showcase their immense potential in enhancing training effectiveness across 
various domains(Abulrub et al., 2011). By reducing expenditure associated with travel and physical resources, 
safety training that has been delivered via virtual methods is predominantly more cost-effective than non-virtual 
alternatives, without sacrificing training effectiveness (Adami et al., 2021) (Stefan et al., 2023). 


Virtual Reality (VR) can present us with realistic replications of real-world situations with a high degree of 
accuracy, and immersive virtualised training scenarios can significantly improve participant engagement when 
compared to equivalent training using conventional methods (Sacks et al., 2013). Trainees presented with a virtual 
environment can engage with high-risk scenarios without actual danger. The elimination of risk fosters confidence 
and risk-free experimentation, which has a significant positive impact upon post-training technical proficiency 
(White & Jung, 2022). Regarding the attitude of trainees towards professional learning content, Loosemore and 
Malouf (Loosemore & Malouf, 2019) suggest that there is “a need to adapt safety training to create more emotional 
connection” between the trainees and their learning within the construction industry, and that “New technologies 
such as virtual reality may be useful this context since through [life-like] immersion in the work environment and 
simulation of workplace accidents, they are able to create a stronger emotional connection with the subject matter.” 
This suggestion is supported by Newton, Wang and Lowe (Newton et al., 2015) who find that “incongruously, 
results indicate that user’s reporting their experience of virtual reality score that experience higher in presence 
terms than users experiencing the physical world,” indicating that virtual experiences may be more emotionally 
engaging and more impactful for trainees than real-world experiences alone. This calls us to re-examine our 
approach to training and education as we begin to see XR technology as an effective tool to enable trainees to 
connect theoretical knowledge and practical application. 


The standard of these simulations is influenced by the quality of virtual representation. High-fidelity 3D illusions 
bridge the gap between physical and digital environments and enhance the task-oriented performance of the 
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trainees (Slater, 2009) and so highly-realistic virtual assets may improve the effectiveness of the virtual experience. 
1.1 3D Scanning Methodologies 


To elevate the authenticity and realism of virtual training, exploring 3D scanning methodologies (such as 
photogrammetry and laser scanning) present exciting possibilities as potential solutions for highly realistic 
representation within VR scenarios. By employing advanced 3D scanning technologies, we can capture with 
accuracy the dimensions and intricate surface details of real-world equipment and environments. After sufficient 
data has been captured with scanning hardware, the data will be manipulated through a pipeline of various 
specialized 3D modelling software to create a mesh that may be rendered by a games engine. 


There are practical challenges associated with the application of 3D scanning techniques which must be addressed, 
such as site-access for data acquisition, followed by noise reduction and asset optimization. To conduct the training, 
the user will be expected to manipulate the asset, or parts of the asset, using virtual reality hardware. Therefore, 
not just aesthetic accuracies, but realistic interaction and functionality will also be essential. Equipment which has 
independently moving components will have to be separated into dynamic and static bodies to facilitate 
independent movement and interaction within the virtual environment. 


1.2 Goals of this Article 


Our effort to establish a framework that adheres to industry best practices has been in collaboration with The 
Faraday Centre, recognised for their expertise in electrical engineering training. Ordinarily, The Faraday Centre 
delivers training using out-of-service switchgear that has been refurbished or donated to the Centre, so that trainees 
can receive hands-on practical training with switchgear up to 33kV. A significant challenge presented by electrical 
engineering equipment is that there are high costs associated with the newer, higher-voltage switchgear, thus 
making their acquisition impractical. A virtual training environment (VTE) offers a cost-effective alternative to 
simulate operation of this high voltage equipment for training purposes. Our data-driven approach hopes to ensure 
that the virtual representations closely mirror their physical counterparts. 


Therefore, we believe that establishing a framework encourages the integration of virtual technologies for 
industrial training scenarios. Our objective is to provide insights into the scanning methodologies, challenges faced, 
and available solutions in capturing the details of real-world environments, equipment, or other assets. To achieve 
this, this paper will review the current technology and methodologies used to emulate real-world equipment and 
their processes within a virtual context. Drawing inspiration from methodologies employed in data-driven digital 
twinning pipelines (Pan et al., 2022), both photogrammetry and laser-scanning applications are integrated within 
this framework and their compatibility with the development of contemporary professional training for high-risk 
environments is discussed. The framework proposed is capable of systematically addressing each obstacle, thereby 
ensuring a seamless transition from physical equipment to the creation of highly realistic virtual training 
environments. 


This paper is organized as follows: section 2 will look review production pipelines, methods and motives for the 
creation of such data-driven virtual assets. Section 3 presents an overview of the technology required to scan a 3D 
object and recreate it as 3D virtual asset. Section 4 will report the framework we have developed as a solution to 
the challenges presented when developing realistic VR-ready assets from high-voltage switchgear scan-data. 
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2. METHODS FOR REALISTIC VIRTUAL REPRESENTATION 


Virtual representation encompasses the creation of digital reconstructions of real-world subjects, including those 
with glossy surfaces like switchgear equipment. Specular (mirror-like) reflections can challenge 3D data capture 
methods like laser scanning and photogrammetry, reducing the usefulness of output models (Frost et al., 2023), 
therefore we will review approaches designed to address issues associated with capturing accurate data. 


Another challenge involves minimizing the computational power required for rendering our results in a real-time 
application. Two viewpoints (one for each eye) must be rendered, making VR susceptible to difficulties with 
framerate, which will be affected negatively by superfluous model complexity. Therefore, our review will be 
extended to provide an overview of various methods to clean and simplify our results. 


2.1 Photogrammetry 


Photogrammetry is a 3D surveying and modelling method which has the major advantage of being low-cost, 
portable, flexible and is capable of delivering highly detailed reconstructions. Three-dimensional information 
about objects or environments is obtained by analyzing a dataset of two-dimensional photographs. 


Photogrammetry relies on the identification of feature points on or within the object being scanned. Areas of the 
subject with aspects like colour variation, surface imperfections, or details such as dust and grime must be 
adequately captured to be reconstructed. Significant overlap across multiple images in the dataset is crucial to 
ensure an ample supply of contrasting, unique points. Observed similarities across images is used to reinforce the 
confidence of the photogrammetry software in determining the 3D positions of each point. Available 
photogrammetry software options are discussed in Section 3. 


Retopology: 
Photogrammetry reconstruction: 


Shoot pictures of the subject: 


The initial phase involves capturing 
multiple images of the subject 
from various angles and positions. 
These images serve as the raw data 
for the subsequent steps. 


This stage involves using 
specialized software to process the 


captured images. Photogrammetry _ 


algorithms analyze the images, 
detect feature points, and create a 
3D point cloud or depth map of the 
subject. 


After generating the 3D model, it 
often requires creating a more 
efficient mesh to optimise 
simulation performance. 
Additionally, UV mapping is 
performed to prepare the model 
for texturing. Any holes or 
imperfections in the mesh are 
addressed. 


a 


Reprojection: 


Once the model is optimized, the 
next step is to reproject the high- 
resolution textures or color 
t> information from the original 
images onto the optimized 3D 
model. This ensures that the final 
asset retains the visual details 
captured during shooting. 


Post-clean: 


This step involves refining the 
model and texture. It includes 
cleaning up any artifacts or 
anomalies in the mesh and 
textures, adjusting lighting and 
shading, and fixing any incomplete 
parts of the model or texture. 


Export: 


The final phase includes baking 
necessary texture maps (such as 
normal maps, ambient occlusion 
maps, and specular maps) and 
exporting the asset in a format 
suitable for integration into a game 
engine. Again, decimation may be 
performed to optimize the asset's 
polygon count for real-time 
rendering. 


Fig. 1: A Photogrammetry process diagram showing an overview of the various stages from data capturing to 


a simulation-ready asset 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


The accuracy of camera alignment and the quality of the created asset is determined by consistency across the data 
obtained from the input images. In cases where the object's surface lacks distinctive features, challenges arise in 
achieving accurate surface reconstruction. According to Schiach, the objects best suited for automated image- 
based 3D reconstruction methods feature amorphous geometries, structured surfaces, numerous edges, and exhibit 
inhomogeneous colouring. Objects that yield poor or no results typically have monochrome, translucent, reflective, 
or self-resembling surfaces (Schaich & Fritsch, 2013). Dark materials, insufficient lighting, and changes in lighting 
can all have detrimental effects on the image quality and may prevent the photograph from registering as correctly 
aligned. Methods we may employ to optimise the conditions in which we capture data include strategic distribution 
of light sources to eliminate shadows, applying a coat of spray to make the surface more responsive to scanning, 
cross polarisation techniques, or by using some combination of these methods (Noya et al., 2015; Porter et al., 
2016). 


2.1.1 Capture methods 


To capture a static object, the photographer moves around the subject, taking multiple pictures from various 
viewing angles. Collecting every angle may be made difficult if the object is quite large and/or positioned 
inconveniently for photo-scanning purposes, meaning a complete scan may be impossible without repositioning 
the object. For the feature detection algorithms to run correctly, the features of the input images must remain 
consistent. Therefore, if we wish to reposition the object, we must take the additional step of separating desirable 
features of our subject from undesirable inconsistencies from background visual information. Typically, this 
involves manually applying masks to each input image, a potentially time-consuming process (Farella et al., 2022), 
even with expediating background removal features like semantic segregation (Chen et al., 2017; Kang & An, 
2021; Ronneberger et al., 2015). 


Alternatively, a camera configuration with strategic lighting can be set up to automate the masking process. 
Background interference may be avoided by ensuring the scanned object is well-lit against a dark, featureless 
background. This allows for the target to be rotated and repositioned in front of a camera which may remain fixed, 
providing sufficient captured data from various viewing angles, without the feature detection algorithms being 
disrupted by undesirable information. The effect of this method may be improved by strengthening the lighting of 
the foreground to heighten the contrast between the foreground and background. This lighting can be provided in 
different ways, the object may be homogenously lit with LEDs from various angles, or a piece of equipment such 
as a ring light may be employed; both may sufficiently eliminate shadows. 


Data Collection Reality Capture Blender f Instant Meshes | 


Manually stitch Decimate mesh 
scans, synthesising to reduce polygon 
amesh count 


Dorsal photos 
Multi-camera array 


Ventral scan 
reconstruction 
Scan 2 
Additional 
individual photos Scan alignment 
Handheld camera 


Draw seams to 
guide retopology 


for clean-edge UVs 
Dorsal scan 


reconstruction 


Scan i 
Ventral 


(underneath) Create a new mesh Recover high- 


photos using the aligned f 
requen 
Multi-camera array : g q cy = 
scans as a template geometry details 


Fig 2: A flowchart describing the process used to create a clean asset from a photogrammetry reconstruction 
using a multi-camera array to capture a turtle (Bot et al., 2019). The software used is included. Recovering 
high-frequencies geometry details will be expounded upon in Section 2.3. 


To capture dynamic objects, a single camera is unsuitable as it presents a high risk of capturing inconsistent data 
due to movement of the subject. Therefore, a multicamera array is used, which typically consists of 4 to 30 cameras 
on tripods or metal rods, with all of them pointing towards a central area. This “rig” of specially calibrated lights 


277 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


and cameras permits efficient and simultaneous data capture from various angles to ensure consistency across 
source images. An alternative method is the use of synchronized video with a common motion (e.g., a clapper or 
a ball drop in view of all cameras). Figure 2 shows the methodology employed by (Bot et al., 2019) when using a 
multi-camera array to scan and create and asset that captures the likeness of a turtle. 


2.1.2 Cross polarisation and reflectance acquisition 


VR is capable of simulating realistic lighting and accurate material properties. Reflectance acquisition techniques 
are used to measure an object's reflectance properties under varying lighting conditions. One such approach using 
polarisation techniques is outlined by figure 3, below. Numerous images are taken with different lighting 
conditions to sample the appearance of specular highlights under a dense sampling of lighting directions, which 
can be data-intensive and time-consuming, particularly when dealing with highly specular surfaces. 


Set up dark background, polarised light source and subject. Calibrate camera with polarisation filter to minimise reflections 


$ 


Adjust subject for 
differeny viewing > 
angle 


Adjust the camera's 
polarisation filter, 
rotating it 90° 


Repeat until capture 
is sufficient 


Capture cross- Capture parallel- 


polarised image polarised image 


c 
S 
E 
v 
v 
© 
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i] 
a 
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Create 2 datasets: one of the cross-polarised images and one of the parallel-polarised images. Pair corresponding images 
by renaming the parallel-polarised image to be identical to its cross-polarised counter-part. 


v 


Edit Layer Blend to Subtract so that the 


Overlay the 


Open parallel- a eae cross-polarised layer is subtracted from the Merge the layers by 
polarised image O one parallel polarised layer, necessary to flattening the image 
= i a £ calculate the light diffusion of the surface 
= 
° 
© 
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Fig. 3 A flowchart showing an overview of the data processing required to prepare what information is 
collected in a cross-polarisation method (Frost et al., 2023) to acquire reflective data, including the software 
employed. 
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Cross polarisation methods produce an image where most of the specular data is removed using two orthogonal 
polarisation filters. One filter is placed on the camera lens, and the other is a polarising film positioned in front of 
the light source to illuminate the target with polarised light. Cross-polarised images are highly effective for 
photogrammetry reconstruction as they minimise disruptions caused by reflections (Frost et al., 2023). 


The polarisation filter on the camera lens can be adjusted to be parallel rather than orthogonal, thus producing a 
corresponding image which preserves specular information. Subtracting the cross-polarised image from the 
parallel-polarised image yields a specular image. Collecting specular images from multiple camera positions 
allows us to create a specular map by replacing the cross-polarised data with the specular data during the 
reconstruction process. This map represents the reflectivity of the object's surface at different locations on the 
mesh. However, achieving this in uncontrolled environments, where ambient lighting is beyond our control or with 
large equipment that requires camera movement, can be challenging and result in inconsistencies. 


2.1.3 Colour correction 


To ensure the accuracy of the model texture, especially for its use in a games engine for simulation, managing 
lighting conditions is crucial. If lighting affects the color of the captured images, a Look-Up table (LUT) can be 
applied to the input images to correct their colour accuracy. Software like Houdini (SideFX, 2022) or Photoshop 
(Adobe, 2022) can generate this LUT from an image of a colour checker taken at the site under the same lighting 
conditions as the photos, and then batch process the input images, correcting colour information. 


Most games engines have their own lighting systems. Depending on the 3D objects being rendered, most 3D games 
engines simulate realistic shadows for objects in relation to in-simulation light sources. These shadows can be 
dynamically calculated at runtime, adjusting with user interactions or object movements. In some cases, shadows 
might be baked into the scene if they are not expected to change. If shadows were captured in the source photos 
due to non-flat lighting during image capture, they could inadvertently become part of the object's texture 
information. To address this, the shadow information should be removed. This can be achieved by opening the 
texture data from the UV maps in software like Photoshop, where adjustments can be made to minimize or 
eliminate the shadows. This process homogenizes and evens out the lighting affecting the texture, allowing the 
games engine's lighting to handle shadows appropriately. 


2.2 LiDAR 


In recent decades, point clouds obtained through light detection and ranging (LIDAR) have become a significant 
data source for various mapping applications within the photogrammetry, remote sensing, and cultural heritage 
communities among many others (Leberl et al., 2010) (Wang et al., 2018). There are two primary LIDAR methods 
to consider, laser scanning and structured light scanning. Both make use of time-of-flight (ToF) calculations, the 
scanner can determine the distance and create a point cloud of the object's surface. Their advantages include their 
noninvasive nature, high precision, and interoperate easily with supporting software. 


Aerial laser scanning (ALS) and Terrestrial laser scanning (TLS) are two examples of long-range scanning methods 
that rely on laser beam emission. The emitted lasers can reflect off of surfaces up to 130 meters away, and can be 
used to scan large objects such as airplanes. The Focus3D S120 (FARO) is a laser scanner employed by (Wang et 
al., 2019) as described in figure 5, so this method may be fit for our purposes, however, long-range can be more 
expensive and may require more time for data processing. 


Structured light scanners project patterns of light (such as grids or stripes) onto the surface of an object. The 
deformation of these patterns on the object's surface is captured by the scanner's cameras. The distortion of the 
patterns is then used to calculate the 3D coordinates of the object's surface points. Cui, Tao and Zhao acknowledge 
that the 3D light-section reconstruction method (depicted in figure 4) is a common and applicable way to obtain 
point cloud data for the needs of 3D reconstruction potential accurate to the millimeter. Structured light scanners 
are generally faster than laser scanners and are well-suited for capturing medium-sized objects with moderate to 
high surface details. 


However, like photogrammetric methods, structured light scanners struggle with reflective, transparent, or 
homogenous surfaces. Their accuracy can vary based on the complexity of the object's surface; for example the 
performance of these scanners suffers when there is a distinct lack of points of interest on the surface, as it makes 
it difficult for the algorithms within the software to accurately track the lasers position frame by frame. 
Consequently, the scanner will “slip,” leading to inaccuracies in scanning surfaces. We may mitigate some of these 
issues by scanning the surface multiple times, or by introducing additional features to aid 3D registration. 
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Fig. 4 A process diagram showing the light-section method for a structured light scan. (Cui et al., 2021) 


2.3 Model Synthesis for Virtual Reality 


To achieve realistic virtual representation, it's crucial to capture high-frequency details. However, this often results 

in high-polygon count 3D models generated by scanning methods, which can slow down real-time simulations, 
especially in virtual reality. Mesh decimation helps reduce the complexity by simplifying the mesh to a target 
polygon count, although some detail is lost in the process. As depicted in figure 2, in cases where the scan data has 
inconsistencies, further reconstruction and cleaning with 3D editing software might be necessary. Alternatively, 
the scan can serve as a reference for creating a new, more accurate mesh. 


High-frequency detail can be restored by generating normal maps from the complex mesh, which are used to 
create detailed shadows and highlights. Unwrapping the mesh's topology into UVs is required to store this data as 
a texture file. Specialised software such as InstantMeshes as mentioned in (Bot et al., 2019) or similarly specific 
tools like those of Houdini (SideFX, 2022) called Sidefx Labs which contains the AutoUV as used in (Triantafyllou 
et al., 2022). After retopologising the mesh, any available texture information can be reprojected. If the captured 


S 
£E 
KE 
eiit- | 
3 g Scan 1 Scan 2 Scan 3 
= 
2 x External body Detailed evidence Injury inflicting tools 
Noise filtration A 
=c 
ng 
3 
Automatical VXelements 6.1 High-resolution 
registion Creaform, Canada photographs 


c 
G 
2 
K 

3 

> 

v 
< 

oo 
= 

= 

Z 

© 

— 
a 

res 

v 

pad 

© 

Eni 


Overall scene z 
model Data Fusion B 


Environment 


Geomagic MeshLab 
3D Systems, USA OpenSource 


Scan analysis 


VR display 


Fig. 5: A diagram showing an overview of methods being used to reconstruct a detailed environment for VR 
(Wang et al., 2019) 
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texture data is insufficient, libraries like Quixel Megascans provide high-quality textures for approximating the 
surface material. 


One example (Alexander et al., 2009), involves the use of a stereo-camera rig with strategically placed lights. This 
rig is calibrated to capture multiple images simultaneously, each providing various lighting information: cross- 
polarized, parallel-polarized, and spectral line measurements for diffuse albedo, specular albedo, and 3D geometry, 
respectively. To aid data calibration, makeup dots on the target are used, ensuring they don't obstruct data while 
allowing precise realignment. By contrast. (Wang et al., 2019) merge laser scan point clouds using registration 
software. Ground control points (GCPs) and other scene features’ known locations are used to combine scan data, 
creating a comprehensive indoor environment reconstruction. Additional scans made with structured light scanners 
add more detailed information to specific areas of interest for analysis. 


Once the data from these various methods has been combined, the next key challenge lies in effectively separating 
these components to enable interaction within the virtual environment. Advanced VR interactions, characterized 
by direct manipulation, diverse input devices, and high degrees of freedom, demand the division of the unseparated 
scan-data model into distinct, potentially modular components. 3D modeling software will play a pivotal role in 
separating the components for independent simulation of their interactions. 


For training purposes, equipment behaviors will also require virtual recreation. While the best approach is to have 
firsthand expert demonstrations of the equipment, this is often not feasible due to factors like high risks and limited 
accessibility. In such situations, an alternative approach is to attach recording equipment to a professional who can 
perform the necessary operations. This recorded footage can then be used as a reference for replicating the 
equipment's behavior in a virtual environment. 


3. TECHNOLOGY 


A standard asset creation pipeline involving scanning processes will require several pieces of hardware to collect 
data, with the appropriate software to process the information. We will also consider hardware and software 
required to develop functionality and render the equipment as interactable models within a VTE. The most 
effective solutions will be discussed below. 


3.1 Software 


Each step in this process necessitates specific software tools. Initially, images must be prepared for alignment, 
followed by running photogrammetry algorithms to construct textured models from these images. The subsequent 
phase involves processing the data obtained through 3D scanning to create a 3D model that faithfully represents 
the physical geometry of the scanned subject. This model must be optimized for seamless integration into a games 
engine for virtual interaction, and various texturing solutions will be evaluated. It's common to encounter multiple 
software options for each stage of the scanning process. Some software packages bundle applications to be used 
in tandem with diverse workflows, and open-source alternatives may also be available. (see Table 1). For the 
software upcoming to be listed, the minimum processing requirements would be a 2GHz CPU and 16GB or more 
RAM. 


Table 1. Depicts a selection of software available from the Geomagic application suite, and corresponding open- 
source applications 


Geomagic software Description Open-source alternative 


Geomagic Capture Scanner specific registration software 


Geomagic Design X Rebuild CAD data reverse engineered from scans | OpenCAD 


Geomagic Control X Visualising and analysing data for quality control | Volume Graphics 


Geomagic Freeform Manipulate and manage large unstructured MeshLab 
meshes 


3.1.1 3D scanning software for 3D scanners 


To process the results of the scanning process, various specialized software solutions are employed to manage scan 
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data and enhance scenes. For this type of 3D scanning software, it's often bundled with 3D scanning hardware, 
and many developers have created their own software packages to accompany their laser scanners. Faro utilizes 
Faro Scene for scan registration and cleanup of collected geometry data, whilst Faro Zone 3D is used for tasks like 
importing high-res photos, utilizing registration targets, and performing metrics calculations within 3D 
reconstructions (Wang et al., 2019). Creaform's VX Elements is used to calibrate data collected from structured 
light scanners, while VX Model serves for more detailed scene modeling or measurements. Artec offers a 
comprehensive set of tools within Artec Studio, tailored for scan reconstruction and will be suitable for processing 
scan data. Additionally, CloudCompare (Open source) is an open-source solution that allows us to compare and 
edit point clouds or meshes (Dewez et al., 2016). It provides the capability to transform scan data to ensure 
alignment with our photogrammetry reconstructions. 


3.1.2 Photogrammetry software 


RealityCapture is renowned as one of the top choices for photogrammetric reconstruction for speed, accuracy, and 
format compatibility. Due to its exceptional capabilities, it is available at a premium price point. Other popular 
premium software includes Metashape (Agisoft) and Recap Pro (Autodesk). 


There are many free photogrammetry software, the most popular of which includes Meshroom (AliceVision) 
which has been integrated as a free plug-in for 3D processing software such as Houdini (SideFX) and Maya 
(Autodesk). Other open source solutions include 3DF Zephyr, Colmap, and Regard3D. 


3.1.3 3D mesh processing/modelling 


3D mesh processing is a fundamental component of the 3D scanning and modeling pipeline, used to manipulate, 
refine, and optimize the three-dimensional mesh models generated from various data acquisition methods, such as 
laser scanning and photogrammetry. Most have access to various plug-ins which augment and enhance the 
capabilities of the software, unlocking a multitude of functionalities that cater to diverse project requirements. 


Premium solutions include 3DS Max, Maya (both Autodesk), Houdini (SideFX), and ZBrush. Zbrush is well 
known in the professional industry for its many highly advanced tools for tasks like cleaning, healing, and texturing. 
3DS Max offers cloth, light and liquid simulations and its own scripting language (MAXScript). Houdini’s 
procedural modeling solutions may provide scalability of modular components, enhancing the flexibility and 
efficiency of the asset creation and simulation process. 


Blender is a remarkable free and open-source 3D modeling software known for its exceptional versatility. It offers 
a wide spectrum of capabilities, making it a powerful tool for cleaning up scans and repairing meshes. While 
Blender has a learning curve, due to its wide availability, there is a wealth of learning resources online for 
techniques such as hard surface modelling. There are also plug-ins which allow you to create highly detailed 
materials, like Substance Designer (Adobe), or create powerful renders of 3D objects. For tasks like modelling 
switchgear equipment, Blender’s extensive features make it an ideal choice for this purpose. 


Among other open-source solutions are weaker options such as Autodesk TinkerCAD and Vectary. These free tools 
operate directly in your web browser, however, are primarily designed to educate entry-level users. For instance, 
TinkerCAD is often integrated into 3D printing processes and has limitations, such as restricting OBJ uploads to 
models with up to 300,000 faces. 


More open-source options include OpenSCAD, FreeCAD, and Sculptris: OpenSCAD requires a bit of previous 
skill as you have to code your objects and it works with primitive geometric shapes and reads the code to modify 
and render them creating 3D models a with constructive solid geometry (CSG) which can be beneficial when it 
comes to 3D printing your projects. FreeCAD is a 3D modeling software was based on Python language which 
allows you to add new specialized features. Similarly Sculptris modifies pre-existing shapes with brushes of 
different strokes. 


3.1.4 Games Engines 


Lastly, the software we must consider is running the simulation so that it may be viewed and interacted with by a 
VR user. Unreal Engine 5 (Epic Games) natively supports VR development and also has the Quixel Bridge feature, 
giving easy access to tools and resources which may be beneficial or time saving for to the project, saving 
development labour. Similar plug-ins are available for Unity and the open-source Godot Engine. These games 
engines provide the necessary framework for creating immersive and interactive virtual environments based on 
the 3D models and assets generated during the scanning and modeling process. 
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3.2 Hardware 


Hardware plays a significant role in capturing visual data and running the software necessary for asset visualization. 
To achieve accurate photogrammetric reconstruction, the quality of the captured images is essential, motivating us 
to explore several camera options, including the Mattterport Pro, DSLR cameras, and due to their wide availability 
we will also consider mobile phone cameras. For highly accurate metrology of our scanning targets, we shall 
review Lidar and structured light scanners. Lastly, we will address hardware that may be used to provide user 
interaction within a virtual environment, such as head mounted displays (HMDs) and review processing 
requirements. 


3.2.1 Cameras and registration 
Table 2. Comparing the Megapixel value of 

Standard photographic equipment is often more accessible | various available camera devices. 
and cost-effective compared to other 3D scanning methods , ; 
like LiDAR or structured light scanning. The camera will be | Device Megapixels 

t ther i t i t t D 1 fi ‘ : 
used to gather input images to create a 3D model from Mobile iPhone 14 48MP 
photogrammetry with an accompanying texture. When 

oe : i ; Phone Pro Max 
aiming for the greatest accuracy, images with a higher 12MP 
resolution are preferred, therefore, to opt for a camera of 
superior quality is justified. 12MP 
Various cameras may differ in quality, varying in number of iPhone 11 12MP 
pixels, sensor size, and field of view. Many pixels help to 
boost the image resolution to capture fine detail, most 12MP 
noticeable when zoomed in. Different lenses can be used with f 
different DSLRs to correctly calibrate the cameras for iPhone 6 8MP 
scanning purposes. Conversely, smartphones may not have as Sasat 16MP 
many customisable options or similar fine-controls over the 
: ; ; Galaxy Fold 
image capturing process, however as can be inferred from 5G 
table 2, smartphones can often offer sufficiently high-quality 
visual data, as well as being widely available, highly portable Google Pixel | 50MP 
and very accessible. Some smartphones have a single camera, 7 
others have dual sensors, quad sensors, however, frequently, 
high-megapixel cameras being used on market smartphones | DSLR Nikon 24MP 
don’t output photos as high as the camera is capable of | Camera D3300 
because of pixel-binning. 
Cannon EOS | 2.11MP 

Using a camera will be essential to capture texture and colour ID Mark III 
detail, as well as for providing proper reference for 
registration within the 3D processing software. Sony X7R 61MP 


3.2.2 LiDAR Scanners 


Table 3. Illustrating the range in available LiDAR scanners depending on the required range of the scan. 
Manufacturer Short range Medium Range Long Range 

Artec Micro | Space Spider | Eva Lite Eva Leo Ray II 

Faro Gage FaroArm | Freestyle Vantage Focus 

Creaform R-series Go!Scan HandyScan MetraScan | MaxSHOT 3D 

Sick S300 series Tim-S OutdoorScan 3 
Leica BLK 360 RTC 360 Scanstation 


LiDAR scanners are known for their high accuracy and ability to capture intricate details. For the purposes of this 
project, they will be used for capturing complex geometries and surfaces with varying textures. Different scanners 
with different features are better suited to various scanning tasks depending on the object size and the necessary 
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scan quality. Faro are well known for their mid-long-range scanners, and Creaform have also been used for their 
handheld scanners by similar project. Other scanners include the Geomagic capture and capture mini, ideal for 
“desktop scanning” of small objects up to the size of a shoebox, as well as the EinScan product range from 
Shining3D. 


Certain scanners integrate both camera components and structured-light sensors. This grants the scanners the 
ability to gather supplementary colour information, which is particularly valuable for laser position-tracking and 
registration processes. Some artec scanners include cameras, allowing colours that the texture camera has captured 
to the 3D mesh being created. The quality of this texture is sufficient for a majority of metronomic applications. 
The quality depends of the generated geometry depends on the selection of the scanner, on the scanning distance, 
the lighting conditions, and the general execution of the scanning routine. 


3.2.3 Matterport 


Matterport is a company specializing in 3D scanning technology and software to capture and render 3D models of 
physical spaces. Their Matterport Pro Camera utilizes depth-sensing cameras and imaging sensors to create 3D 
point clouds of environments. The Matterport Pro2 3D camera offers 36MP images with a scan accuracy of +/- 
50mm, while the Pro3 improves accuracy to +/- 20mm at a 10m distance. This tripod-mounted device captures 
comprehensive visual data by rotating 360 degrees in a short time. However, there are privacy concerns regarding 
detailed models unintentionally capturing sensitive information. 


Matterport provides an iPad app for camera control, offering a "Dollhouse" view to identify unscanned areas. Users 
can navigate 3D models by selecting points within the model, making it popular for virtual property or office tours. 
They also have a mobile application using LiDAR sensors in phones to scan objects and generate 3D meshes 
in .obj format. While convenient, these scans may lack the precision needed for high-fidelity virtual assets, 
particularly in capturing intricate surface details. 


For this project, Matterport services have drawbacks. They can be costly due to hardware expenses, service charges, 
and the need for additional payment to access the metadata folder (MatterPak). The generated point cloud format 
(.xyz) lacks widespread compatibility, often requiring conversion to more universally accepted formats like .e57. 
Furthermore, Matterport's scanning technology might not provide the required accuracy and detail for the project, 
especially in capturing nuanced surface features necessary for high-fidelity 3D models. 


3.2.4 VR Hardware 


Different head-mounted displays have been designed for slightly different purposes. While most headsets come 
with controllers, not all controllers are the same. Because the head-mounted display is the hardware through which 
the student interfaces with the training environment, the controller will dictate the possible depth of interaction In 
the context of this research, the emphasis is on a cost-effective and immersive VR solution. Many VR headsets 
can run the proposed simulation. However, a mid-range specification HMD with stand-alone capabilities is 
preferred over more powerful and expensive headsets such as the HTC Vive Pro line of HMDs. This choice 
imposes certain technological limitations on the performance of the 3D virtual representation. 


For this project, the target headset will be a Meta Quest 2 VR headset. As well as its performance capabilities, the 
oculus link cable accessory allows the HMD to interface easily with a PC for development and testing purposes. 
The Pico Neo line of HMDs boasts similar specifications as the Meta Quest 2, both headsets have previously been 
used for virtual training and education purposes (Cowie & Alizadeh, 2022; Han et al., 2022; Moolman et al., 2022). 


4. EXEMPLIFYING THE FRAMEWORK: HIGH-VOLTAGE ELECTRICAL 
SWITCHGEAR 


Photogrammetry excels in capturing high-detail visual information, although as mentioned the resulting three- 
dimensional information may be susceptible to gaps, noise and inaccuracies. To use a fixed-camera or a 
multicamera set-up is feasible only for objects compatible with the rig in scale and shape, meaning they are mostly 
applicable only for small-to-medium objects. We shall be capturing objects on the site of their professional 
environment, therefore lighting conditions may not be perfect. Because of this the geometry that will result from 
our photogrammetry effort will likely have inconsistencies and not be very robust. For this reason we shall not 
rely on geometry data obtained this way, however, efforts will be made to retain any worthwhile texture 
information generated by the photoscan. 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


The 3D geometry obtained from LiDAR scan will likely be more robust however due to poor exercise of control 
over the lighting conditions, a complete and consistent scan cannot be guaranteed. As mentioned in section 2, we 
shall seek to mitigate these inconsistencies by employing the light-section method and performing multiple 
overlapping scans. To avoid incongruities caused by erroneous position tracking, a consequence of featureless 
scanning geometry, one solution emerged as notably effective: affixing ping pong balls or golf balls onto the 
surfaces of the equipment using blue tack. This addition of texture intricacies facilitated a more precise registration 
of the scanner's position during the scanning process. As the scanner traversed the modified surfaces, the intricate 
texture details provided the necessary points of reference for the algorithms to accurately determine the scanner's 
movement. Consequently, the scanner's accuracy improved significantly, and the issues of slippage and positional 
loss were effectively mitigated. 


After combining the LiDAR and photogrammetry data into a unified 3D visualization, we suggest employing 
Blender to refine and optimize this asset. In case the resulting asset falls short of the realism required for real-time 
VR, the reconstructed data will serve as a reference template for generating a new mesh. By utilizing the scan data 
as a guide, the precise measurements obtained from the scan data can inform the development of an equally 
accurate 3D object. Furthermore, we have access to suitable replacement textures to maintain our goal of 
photorealism. While this process may demand additional time and effort, it is essential for achieving an immersive 
virtual reality experience. 


Align the LIDAR model Reproject high-quality 
with the texture information 
anaes = on Pronecese cate > photogrammetry >» onto high-accuracy 
oe Artec Studio model geometry information 
Artec Eva Lite Cloud Compare Reality Capture 


individual meshes for each 
independent part of the equipment 


Using the scan data as a template 
and employing hard-surface 
Photogrammetry modelling techniques to create 


Create 
Gather high-quality Prepare photos, Blender 
photos calculate LUTs adele 
DSLR camera Lightroom i 


Reality Capture 


Recover photogrammetry textures 
or acquire sufficient texture 
substitutions from Quixel Bridge. 
Render in VR. 

Unreal Engine 5 


Fig 6:. Shows a process diagram outlining the methodology best suited to meet our needs of reconstructing a 
piece of equipment for virtual representation 


5. CONCLUSION 


This paper is structured to detail the methodological approach used in each stage, its limitations, and to empirically 
evaluate its effectiveness. By integrating advanced technologies and methodologies, this research strives to 
simplify the development of immersive training environments by reviewing and optimising the process of virtual 
representation. The framework presented is designed to methodically overcome various challenges, highlighting 
opportunities for automation of repetitious tasks associated with the necessary data processing, and facilitating a 
smooth shift from physical equipment to the production of highly lifelike virtual training environments. 
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ABSTRACT: The diffusion of Building Information Modeling (BIM) and advanced visualization technologies in 
the increasingly digitalised construction sector is fostering the development and implementation of disruptive 
approaches for workforce Health and Safety (H&S) training. Project-specific risks, safety procedures and 
information can be administered through immersive Virtual Reality (VR) experiences where construction site 
environments and activities are reproduced without exposing the trainees to real hazards. However, despite 
numerous research and industry applications demonstrating the potential benefits of these technologies, a 
standardized framework and methodology for the evaluation of VR safety training effectiveness for construction 
workers is still lacking hence hindering its large scale-adoption and recognition from policymakers. Within the 
scope of previous authors contributions on the development and implementation of BIM-based VR experiences for 
construction workers’ safety training, this paper aims to address the evaluation of their effectiveness proposing a 
novel semi-qualitative approach based on the integration of trainees’ subjective and objective data. A post- 
experience evaluation questionnaire is developed to collect trainees’ direct and qualitative feedback about the 
experience immersivity and perceived safety content transfer. Furthermore, the integration with trainees’ spatial 
tracking data is proposed to complement the qualitative feedback with the quantitative evaluation of their use of 
the virtual space for safety training purposes. The application of the presented approach in case study is currently 
undergoing and the related results will be subject of future contributions. 


Keywords: Virtual Reality (VR), Construction worker, Safety training, Evaluation, Spatial tracking, Heatmap 
visualization, Survey 


1. INTRODUCTION 


Despite recent technological innovations and policy improvements in workforce Health and Safety (HS) are 
contributing to a low but steady reduction in accident rates, the construction industry is still one of the most 
dangerous, accounting for one fifth of yearly workplace fatal accidents in the European Union alone (Eurostat, 
2022). In this regard, the growing adoption of real-scale immersive visualizations of complex site scenarios and 
construction activities enabled by Building Information Modelling (BIM) and Virtual Reality (VR) technologies 
are supporting HS managers in the early identification and mitigation of safety risks (Babalola et al., 2023). 
Moreover, since their early applications, it has been acknowledged that immersive VR simulations of project- 
specific site layouts and activities can improve the transfer of safety contents and preventive procedures to the 
trainees, while empowering their awareness in later real-site hazardous contexts (Rokooei et al., 2023). However, 
the administration of VR experiences for construction workers safety training is far from substituting traditional 
methods (e.g., slides) and is still confined to a minor share of early adopters. In fact, while economic and technical 
barriers have progressively shrunk, the lack of standardized frameworks and methods for the evaluation of the 
effectiveness of construction site VR training still stands as a major obstacle for its recognition from policy makers 
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and hence for their large-scale adoption in the industry. In this context, several contributions have shown how the 
quantification of the trainee’s ability to perform cognitive and practical tasks in the VR environment (e.g., hazard 
identification, activity simulation) can be used to objectively assess the training effectiveness (Li et al., 2018). 
Nonetheless, the qualitative evaluation of VR training, collected via post-experience questionnaires administered 
to the trainees, is often overlooked or tailored on a specific application, so that its reuse in other case studies is 
impractical. 


To address the mentioned open issues in the evaluation of VR safety training experiences for construction workers, 
this paper proposes a novel approach based on the integration of trainee’s subjective perception and objective VR 
spatial usage. The former is collected through a questionnaire administered to the trainee after the experience and 
divided in five inquiry areas. For the latter, the acquisition of the users’ spatial track during the VR experience and 
its restitution in a BIM environment through an heatmap visualization is proposed to evaluate the trainees’ use and 
understanding of the virtual environment in relation to safety training purposes. The present work stems from 
previous authors’ contributions in the development and administration of BIM-based immersive VR site 
simulations to construction workers for HS training and management purposes and is currently being tested in an 
infrastructure project case study whose results will be subject of future contributions (Getuli, Capone, Bruttini, & 
Sorbi, 2020; Getuli et al., 2018, 2021, 2022). 


2. BACKGROUND 


The use of virtual reality as a safety training technology is gaining attention in the construction industry. While 
many of the current studies mainly focus on the development of VR-based safety training programmes, it is 
noticeable that there is still a lack of research focusing on assessing its effectiveness. In this section, an overview 
of the state of the art with regard to the use of immersive virtual reality used in the realisation of training sessions 
for workers is given and, finally, the identified open problems and obstacles to implementation that this research 
aims to address are reported. 


2.1. BIM and Virtual Reality for construction workers’ safety training 


Increasing use of BIM is favouring the adoption of VR in the Archtiecture, Engineering and Construction (AEC) 
sector. Typical applications for VR include construction safety planning and training (Azhar, 2017), production 
planning and design review sessions (Wolfartsberger, 2019). 


In recent years, several studies have shown that BIM models can be used to represent construction site layouts and 
extract data for space and activity optimisation (Tao et al., 2022) and apply automated safety rule checking to 
simplify hazard recognition and assessment and risk assessment activities (I. Kim et al., 2020). Most of all, the 
spatial understanding and information visualisation capabilities provided by construction site BIM models have 
been harnessed to transfer general and project-specific HS knowledge with the implementation of VR technologies 
for the reproduction of construction site scenarios and activities for worker safety training and site planning (Getuli 
& Capone, 2018). 


Several studies investigated the use of virtual reality to allow construction workers and supervisors to have easy 
access to the BIM through a simple VR interface and also for the education of students, defining an efficient 
educational tool by integrating VR application and BIM model information to develop 3D, 4D, and 5D simulations 
(Esfahani, 2023) and offering suggestions to AEC educators and students in implementing BIM-into-VR in 
different courses. 


The use of virtual reality as a safety training technology is gaining attention in many different fieds: using a fully 
immersive VR has been shown to offer numerous benefits in terms of the effectiveness of health and safety training 
including risk assessment, machinery and/or process operation training in various industries (Toyoda et al., 2022). 


VR is gaining attention also in the construction industry, but while many of the current studies mainly focus on 
the development of VR-based safety training programmes, it is noticeable that there is still a lack of research 
focusing on improving its effectiveness. It has been shown that telepresence experienced through VR and learners' 
perception of the risk of occupational accidents significantly influence their satisfaction with VR-based safety 
training, thus influencing its effectiveness (Yoo et al., 2023). 
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2.2. Construction workers’ safety training evaluation 


The evaluation of the effectiveness of simulations developed in VR during the training sessions or the validation 
of the experience carried out is a key step to identify different problems and to be able to solve certain situations 
by fully improving the training activity for trainees, but while VR-based training has been proven to improve 
learning effectiveness over conventional methods, there is a lack of study on its learning effectiveness due to the 
implementation of training modes. It is known that BIM and VR for safety training of construction workers are 
useful in many contexts (Afzal & Shafiq, 2021), although there is still no standardized method to evaluate the 
effectiveness of safety content transfer. In fact, most of the proposed methods are specific to individual case studies 
or applications in different fields, and the effectiveness of the training administered as perceived by the trainee and 
the spatial understanding of the worksite scenario and activity in VR is often overlooked. 


The evaluation of the user experience in virtual environments can be done either with subjective methods, such us 
questionnaires (H. K. Kim et al., 2018) or with objective methods, like eye-tracking, or brain activity 
measurements (Hertweck et al., 2019). 


Regarding to the evaluation of the VR safety training of construction workers, several survey methods were 
proposed, including the creation of a questionnaire containing open and closed questions to evaluate various 
aspects of the VR interface on a scale of 1 (poor) to 5 (excellent) with subsequent collection of further data through 
observations and conversations with participants during and after the VR tests (Johansson & Roupé, 2019). 
Questionnaires are a widely used and well-known tool to collect user feedback and changes in mental states during 
various activities (Robinson, 2018) including VR applications. 


Although the use of objective measurement methods is promising, questionnaires are the most frequently used tool 
in user experience studies for VR. These questionnaires can be used as pre-, real-time, or post-assessment methods. 
In pre-surveys, the user is not immersed in a virtual environment: this can lead to a less dominant difference 
between VR and the traditional desktop presentation of questionnaires. After immersion into the virtual 
environment, however, it is important to investigate the influence of the type of questionnaire presentation on user 
experience (Safikhani et al., 2021). 


Another important topic is the movement or spatial tracking of workers during VR simulation of construction 
activities, that has been shown to be useful for ergonomic evaluation of workstations or assembly procedures 
(Getuli, Capone, Bruttini, & Isaac, 2020). 


Some studies demonstrate the effectiveness of VR simulations of assembly lines and task scenarios in an 
ergonomic approach to workplace design, aimed to optimize the production and the human-machine interaction 
(Caputo et al., 2018). Through the collection and analysis of the position tracking of a worker (Michalos et al., 
2018) and human motion and posture tracking systems is possible to obtain reliable and repeatable measures to be 
used for evaluation of activity and workplace-related working postures. 


2.3. Open issues 


Although BIM and VR for construction workers’ safety training has been proven beneficial in many studies, there 
is still a lack of a standardized method to evaluate safety contents transfer effectiveness. Most methods are specific 
to single case studies or applications; they are used to assess the training on objective quantification of the trainee 
ability to accomplish tasks in the VR environment but overlook trainee’s perceived effectiveness of the 
administered training and spatial understanding of the construction site scenario and activity in the VR, and ignore 
the subjective worker perception. 


Movement or spatial tracking of the trainees/workers during VR simulation of working activities has been proven 
beneficial for the ergonomic assessment of workstation or assembly procedures but not yet to evaluate the spatial 
understanding of the trainee in VR safety training experience that could instead be leveraged in the safety content 
transfer evaluation of HS training VR experience. 


3. PROPOSED APPROACH 


The aim of this paper is to propose an evaluation method of VR safety training experiences for construction 
workers that is based on the integration of a qualitative survey of the experience and spatial tracking data of 
participants. To this end, first a dedicated questionnaire was developed to evaluate both the immersiveness of VR 
experiences and the effectiveness of the safety content presented. Then a study on the benefits of tracking and 
visualizing trainees' use of space in the virtual environment is proposed. Finally, the proposed approach was 
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SECTION A - EXTENDED REALITY TECHNOLOGIES IN CONSTRUCTION 


validated within a case study previously developed by the authors and used for BIM-based construction worker 
safety training in VR. 


As mentioned above, the proposed approach is based on the integration of two main categories of data, as 
illustrated in Figure 1. The overall goal is to collect a set of data from trainees experiencing immersive VR training, 
aimed at evaluating the effectiveness of the virtual experience. The detailed description of the types of data to be 
collected and the specific purpose for which they need to be obtained then follows in Sections 3.1 - 3.2. 


i Result clustering and indication of 

> ID & Demographics the worker's data 

Ss Immersivit Evaluation of trainee's satisfaction 
= y and “motion sickness” symptoms 


a Contents Effectiveness u __, Evaluation of training virtual 
Qualitative Data representation 
Supervised / Administered 

a. 


Questionnaire Effectiveness of the immersive VR 


Training Experience Effectiveness ———> 
training experience 


— Future Developments —> New contents addition 


= Collection of trainees' personal 


t> Suggestions (Open Questions) impressions and ideas 


Collect information on the path 


m> Worker Spatial Track followed by trainees 


Quantitative Data 


Spatial Tracking __, Worker Temporal Use of Space __. Gathering information about the 
(Heatmap) amount of time spent in a place 


Fig. 1: VR safety training experience effectiveness evaluation approach — Data schema 


3.1. Post-experience evaluation questionnaire 


The approach adopted for the creation of the evaluation questionnaire led to the distinction of two main sections: 
the first section includes the trainee's personal data, while the second section corresponds to the actual evaluation 
part. The latter includes both a series of evaluation questions, for each of which the trainee can give a score on an 
increasing scale from one to five (where five represents maximum satisfaction), and an open "suggestions" section 
to collect trainees' personal impressions and ideas. Table 1 shows the sections and subsections with an indication 
of the purpose for which it was deemed necessary to include them, a brief description, and an explanatory example. 


3.2. Trainee’s spatial tracking and use of space visualization 


From the immersive VR experience, data are collected not only directly, by filling out questionnaires, but also 
indirectly with the acquisition of the worker's position in the virtual environment. The latter procedure involves 
capturing the position of the worker in the VR environment during the entire course of the simulation in order to 
analyse the actual use of the workspace and is recorded with a data acquisition rate of 1 Hz. The 3D point sequence 
resulting from this process is then converted for the generation of heatmaps within which the position of the worker 
is represented with a gradient of colour ranging from green to purple. 


As with the formulation of the questionnaire, mentioned in the previous paragraphs, an approach was defined to 
analyse worker movement during the immersive VR experience in order to visualize position tracking. This 
methodology involves taking into account not only the spatial coordinates, but also the time duration required for 
the performance of the experience. 
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Table 1: Questionnaire contents 


Data type 


Description 


Example 


ID & demographics 


The worker's data, which includes an identification code/number (ID) for 
classification of the completed questionnaire and an indication of the 
worker's age, role/occupation and company, with the purpose of clustering 
the results obtained. 


[Text] 


ID (Number) 


Age; Company; Role 


Immersivity 


Questions related to the user's ability to get immersed in the virtual 
experience, useful for assessing how engaged the trainee actually was in 
the virtual training scenario. It consists of an assessment of the trainee's 
satisfaction in terms of both ease of use and comfort (also related to the 
possible onset of symptoms of "motion sickness"). 


[Single rating — Likert scale] 


How much did you like the 


experience? 


[min 1; max 5] 


Contents’ 
effectiveness 


Questions regarding the virtual representation of the site and security 
content, aimed at evaluating the effectiveness of the experience against the 
objectives of the training session. 


[Single rating — Likert scale] 


How accurately does the virtual 
construction site reproduce the real 
one? 


[min 1; max 5] 


Training 
experience’s 
effectiveness 


Questions pertaining to the overall effectiveness of the experience, aimed 
at investigating the actual usefulness of the immersive VR training session 
and whether it turns out to be as comprehensive and exhaustive as a 
traditional training session. 


[Single rating — Likert scale] 


Do you think this experience is 
useful in understanding the hazards 
present on the construction site? 


[min 1; max 5] 


Future development 


Questions concerning the introduction of new automation or virtual content 
aimed at enhancing the immersive VR training experience through the 
inclusion of even virtual objects with enhanced interactivity. 


[Single rating — Likert scale] 


Would you like to be able to grasp 
and use objects from the 
construction site? 


[min 1; max 5] 


Suggestions Open question to collect personal impressions and ideas from the trainees. Suggestions 
[Text] 
Table 2: Heatmap schema contents 
Data type Description Example 
ID & demographics See Table 1 See Table 1 
Training experience The duration of the experience from start to finish, excluding the tutorial 523 sec 


duration 


that is run at the end to explain the operation to trainees, takes into account 
how long it takes the user to complete the experience. 


[time in seconds] 


Worker spatio- 
temporal track 


The graphical representation is given by a series of points that correspond 
to the coordinates of where the user was every | second and are used to 
create an image that shows the path followed by the worker distinguished 
with different colours. 


[Frequency in Hz] 
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Image with coloured representation 
of the worker’s followed path. 


[min. green, max. purple] 


4. IMPLEMENTATION 


The area in which this paper is located is part of a previous research project in which the authors' goal was to 
develop a prototype protocol for the design and delivery of safety training to construction workers, based on an 
innovative and interactive learner-centred approach. This protocol was designed and then tested for validation in 
a construction project in Italy that served as a case study for the development of VR training session content and 
implementation (Getuli et al., 2021). During and after the exploration of the site phases implemented, the worker 
undergoing the experience was invited to provide indications and opinions regarding the work environment in 
which he/she was working. This was done with the aim of both validating the different site layouts designed and 
to collect any objections and/or observations from the worker, thus enhancing their experience and giving it due 
importance within the development process of the virtual reality training experiences covered by this research 
work. 


In order to collect direct feedbacks and suggestions to better drive the decision of the development direction of the 
implementation of the proposed VR training protocol, an evaluation questionnaire (Fig.4) was administered from 
a staff member to every trainee involved after they finished their test training session. The authors drawn the 
questions in relation to the following development areas, weighting the number of the questions for each one 
according to their research objectives: 


e Immersivity: (question | to 4) Evaluation of the trainee’s satisfaction in terms of ease of use and comfort 
(also related to eventual “motion sickness” symptoms occurrence). 

ə Contents’ effectiveness: (question 5 to 10) Evaluation of the site’s and the safety contents’ virtual 
representation in respect of the purposes of the training session. 

e Training experience’s effectiveness: (question 11 to 14) Evaluation of the overall effectiveness of the 
immersive VR training experience. 

e Future development: (question 15 and 16) Evaluation of the introduction of audio and enhanced object’s 
interactivity as new features to be implemented in future developments. 


For each evaluation question the trainee can give a score on an upward scale from one to five, where five represents 
the highest satisfaction. Furthermore, an open “suggestions” section is added to collect personal impressions and 
ideas from the trainees. All the results were collected and processed in anonymous form. 


User Data 


ID: 

Age: 
Company: 
Role/Job: 


Questionnaire min - max 


> 


RAHA HAH HHH HAHAHAHA 
NANO nnranrnannanannanannn 


1) Did you enjoy the experience? 

2) Were you comfortable during the experience? 

3) Was it easy to use the viewer and controller? 

4) Did it bother you not being able to see your body in the virtual world? 

5) Are the signal arrows helpful in figuring out where to go? 

6) Does the virtual construction site sufficiently replicate the real one? 

Contents' 7) Do the work spaces indicated for the work seem adequate? 

Effectiveness 8) Are the safety and hazard spaces useful in signaling hazardous areas? 

9) In your experience, are the work procedures reproduced correct? 

10) Do you think this experience is useful in getting to know the worksite before entering it? 
11) Do you think this eperience is useful in understanding the hazards present at the worksite? 


Immersivity 


ere 12) Is this type of training useful for an inexperienced worker? 
: io . h 
Effectiveness 13) Is this type of training useful for an experienced worker? 


14) Would you prefer to see people and machines in motion ? 
Future < 15) Would you prefer to hear sounds and noises inside the worksite? 
Development 16) Would you like to be able to grasp and use worksite objects? 


RR ee ee ee ee 
NNNNNNNNNNNNNNDN WY 
WWWWWWwwwwww www w Ww 


Suggestions 


Fig. 2: Scheme of the VR training session evaluation questionnaire administered to the trainees 
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In addition to the data collected in active form through the questionnaires, an algorithm for recording the position 
of the worker in the virtual reproduction of the worksite was integrated into the VR training experience 
applications. In this way, during the training experience, with an acquisition frequency of 1 time per second (1 
Hz), the coordinates of the worker's position are recorded in relation to both the work spaces designed for the 
simulated construction activities, and in general in his movement within the construction site, so that his aptitude 
for recognising risk areas can be assessed a posteriori at both the site and activity scales. 


The interpretation of the movement traces acquired by the VR device during training in the form of sequences of 
points in the virtual space of the worksite was conducted by developing an additional analysis algorithm capable 
of reporting this information within the BIM model of the worksite and providing a graphic interpretation by means 
of heat-map visualisations. These visualisations allow the temporal dimension of the path followed by the worker 
in VR to be reported in a planimetric elaboration, distinguishing with different colours, in accordance with a pre- 
set gradient, the areas where the worker spent the longest time (violet, red) from those of short passage (green). 


The tracking of the worker's position recorded during the VR activity simulation is done by visualizing the worker's 
use of space based on a temporal heat map, which consists of a 2D representation of the 3D position points recorded 
by the worker. The time dimension of the position tracked by the worker is graphically represented through a 
colour gradient (green to red), so that a red-coloured area represents a previously recorded position (red indicates 
a position occupied longer during the VR simulation). The heat map is then automatically generated using a custom 
algorithm specially developed by the authors. 


Comparison of the generated heat map with the initial configuration of the workspace, in a BIM modeling 
environment, allows for early identification of possible planning errors; in fact, the results of the analysis of the 
obtained data are necessary for the next planning step, i.e., modification of the workspace configuration. 


d min - see 
[Duration | 0000 | 000 


Site Scenario 
Plan 


Space Usage 


Gradient 


Note 


Fig. 3: Heatmap visualization (plan view) of the trainee usage of the virtual training environment 
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5. CONCLUSIONS 


The present work, through implementation of an approach based on the integration of two main categories of data, 
with the overall objective of collecting a set of data from trainees experiencing immersive VR training to evaluate 
the effectiveness of the virtual experience, contributes to the development of a standardized method of evaluating 
the effectiveness of safety content transfer to workers that does not neglect the trainee's perceived effectiveness of 
the administered training and spatial understanding of the worksite scenario and activity in VR, as is the case with 
most of the methods proposed in the literature that are specific to individual case studies or applications and 
evaluate training based on an objective quantification of the trainee's ability to perform tasks in the VR 
environment. 


The developed questionnaire, dedicated to evaluating both the immersiveness of VR experiences and the 
effectiveness of the safety content presented, and the proposed study on the benefits of tracking and visualization 
of learners' use of space in the virtual environment, allowed us to conduct an evaluation of safety training 
experiences in VR for construction workers, based on the integration of both a qualitative survey of the experience 
and the participants’ spatial tracking data. Spatial tracking of trainees and their movement in space during the VR 
simulation of work activities proved useful for the evaluation of trainees’ spatial understanding of the VR safety 
training experience. 


Finally, the proposed approach was validated within a case study previously developed by the authors in which 
the authors' goal was to draft a prototype protocol for the design and delivery of safety training to construction 
workers, based on an innovative and interactive learner-centred approach. That protocol was designed and then 
tested for validation in a construction project in Italy that served as a case study for the development of BIM-based 
construction worker safety training in VR. During that work, a total of 6 VR training experiences were developed 
for workers, the contents of which consisted of the 3D models needed to reproduce the construction site scenario 
of the case study in the first 3 phases of construction: site set-up, installation of the external staircase and erection 
of the tower crane. At the same time, 4 training days were organized, during which the results of the proposed 
questionnaires were collected with reference to the different VR experiences carried out. 


During and at the end of each VR training session of the 6 different site phases implemented, the worker 
undergoing the experience was asked to fill out the questionnaire developed to provide input and opinions on the 
work environment they were in, for the evaluation of the session, in order to enhance the experience and collect 
useful data for the development of subsequent implementations. The results obtained from the above evaluation 
and the discussion of the related case study previously mentioned will be the subject of further publication. 
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ABSTRACT: This study explores the potential of Blockchain (BC)-enabled Digital Twins (DT) using qualitative 
semi-structured interviews to investigate the perception of stakeholders in the Architectural, Engineering, 
Construction and Facility Management (AEC-FM) industry on the relevance of BC-enabled DTs in augmenting 
stakeholder collaboration. The findings revealed that most interviewees perceived the potential of a BC-enabled 
DT in fostering stakeholder collaboration, leading to enhanced project delivery. Some participants viewed 
affordability drivers, whilst some highlighted the desire to fulfil client demands as influencing drivers for BC- 
enabled DT implementation in the AEC-FM industry. The study’s empirical findings align with evidence from 
other industrial sectors, proving that BC can ensure data integrity in a decentralised peer-to-peer framework, 
whilst DTs can leverage that data for effective and reliable decision-making. In the AEC-FM industry, these 
technologies are nascent; however, their potential integration could tackle critical issues regarding stakeholder 
collaboration and information fragmentation, leading to value generation in a decentralised and immutable 
manner. This study offers insights into implementation strategies for a BC-enabled DT collaborative environment 
and contributes to accelerating the industry’s approach to digital transformation. 


KEYWORDS: digital twins, blockchain, drivers, collaboration 


1. INTRODUCTION 


In a project-dominant sector like the Architectural, Engineering, Construction and Facility Management (AEC- 
FM) industry, collaboration is seen as a critical catalyst in boosting efficiency, promoting information sharing, 
facilitating effective decision-making, and improving the quality of production processes and project-based 
performance (Koolwijk et al., 2018). Unfortunately, collaboration in the sector is beleaguered with mistrust, 
ineffective communication, adversarial relationships, and unnecessary disputes. This has resulted in the industry’s 
fragmented nature and inconsistent activities by different participants, hindering progress towards project targets 
and creating a significant barrier to success (Li et al., 2021; Prebani¢ & Vukomanović, 2021). 


Fragmentation of the AEC-FM industry is further compounded by the industry’s complex activities due to the 
variety and volumes of entities involved (Chen et al., 2022), the duration of a project, the amounts of relevant data 
generated and dispersed stakeholders who work at the different phases of a built asset’s lifecycle. Therefore, this 
creates breeding grounds for loss of crucial information, untracked implementation, and, most importantly, the 
failure to meet client requirements. Given the industry's complexities, it is essential to identify tools or solutions 
that can improve collaboration and contribute to process efficiency. 


There has been an increasing trend in adopting digital technologies in various industries (Pour Rahimian et al., 
2022), especially with the advent of Industry 4.0, to overcome industry complexities. However, the AEC-FM 
industry has slowly adopted, used, or applied emerging digital technologies (Newman et al., 2021). Studies show 
that applying digital technologies such as Building Information Modelling (BIM), Digital Twins (DT), Distributed 
Ledger Technologies (DLT) or Blockchain (BC), Internet of Things (IoT), and Augmented or Virtual Reality in 
the AEC-FM industry can increase productivity, collaboration, quality, and efficiency (Olanipekun & Sutrisna, 
2021). Under the concept of Construction 4.0, the AEC-FM industry is making efforts to adopt such emerging 
technologies and utilise their advanced capabilities (Opoku et al., 2021). 


BIM is now widely recognised as an effective way to facilitate collaboration, communication, and management. 
It is a common tool or process that provides a high level of information depth (Gan et al., 2019), containing all 
necessary details about objects and processes for the asset's entire lifecycle and the different stakeholders involved 
(Khajavi et al., 2019). Despite the well-known benefits of BIM in terms of collaboration, issues relating to 
collaboration persist (Oraee et al., 2021); these might stem from a lack of shared collaborative culture, limited 
understanding of emerging technologies among project teams, and a preference for traditional methods (Che 
Ibrahim et al., 2019; Ibrahim & Belayutham, 2019). Furthermore, the multi-level capabilities of BIM are limited 
to implementation without the inclusion of real-time information to achieve a close-to-"as-built" or "up-to-current" 
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state of the built asset being actualised in physical form (Lu et al., 2021). Wang et al. (2020) suggest that the 
absence of real-time information exchange can lead to fragmented and discontinuous actions among participants. 
Therefore, a collaborative platform allowing real-time information sharing among all parties is necessary. 


DT technology has immense potential to facilitate real-time communication and collaboration among project 
participants (Lee et al., 2021). DT surpasses BIM by providing more "up-to-current" modelling (Rao et al., 2022; 
Xie et al., 2020). Integrating IoT sensors and DTs can also transform BIM into a dynamic tool, automatically 
updating as-built BIM (Dudhee & Vukovic, 2021). Furthermore, DTs can simulate "what-if" scenarios with AI- 
based techniques to identify potential solutions to issues such as cost overruns and schedule delays, enabling 
stakeholders to make proactive decisions (Lee et al., 2021). In addition, DT technology offers a shared virtual 
environment where project participants can visualise the project, discuss options, and ultimately make well- 
informed decisions (Zhao et al., 2022). Several studies have emphasised that creating and managing DTs is a 
continuous process relevant throughout a built asset’s lifecycle (Xie et al., 2023). However, these studies have also 
highlighted the importance of ensuring data security, reliability, and improved collaboration (Hellenborn et al., 
2023). 


BC technology is anticipated to fuel innovation in essential areas such as security, trust, and coordination with 
unified standards and protocols for information sharing. This decentralised, peer-to-peer framework is based on 
cryptographic mechanisms (Elghaish et al., 2022; Talla & McIlwaine, 2022) and could be an ideal solution to 
tackle the challenges of DT concerning security, reliability, and transparency (Kiu et al., 2022). BC enables a 
secure electronic ledger of digital information, utilising hash values to enhance security. BC uses consensus 
protocols through a decentralised network to ensure data reliability and encourages collaboration among project 
participants to record, verify, store, and extract construction project information and transactions without 
centralised data intermediaries (Kim et al., 2020). 


When it comes to the digital transformation in the AEC-FM industry, the integration of BC and DT technology 
poses the potential to ensure data integrity, security, and trustworthiness, thereby enabling more effective 
collaboration among stakeholders (Adu-Amankwa et al., 2022); however, only limited studies on its potential 
integration can be found in AEC-FM literature. Additionally, its adoption and potential integration may be a 
complex phenomenon which could be influenced by factors considered from multiple perspectives. Hence, 
investigating and exploring stakeholders' perspectives on the potential of a BC-enabled DT collaborative approach 
can contribute to designing a framework and informing policymakers towards its successful adoption and 
implementation. To help contribute to the limited studies on BC-DT integration, this current study aims to explore 
and understand the perspectives of the industry’s stakeholders on the potential applicability of BC-enabled DT 
collaborative approach using qualitative semi-structured interviews. In line with the aim, the study seeks answers 
to the following research questions: 


e What do industry stakeholders perceive as the potential role(s) of BC-enabled DT collaborative approach? 


e What factors would motivate industry stakeholders to pursue a BC-enabled DT approach to collaborative 
working? 


The remainder of the article is structured as follows: Section 2 reviews relevant literature, while Section 3 explains 
the methodology used. Section 4 presents the results and discusses the perspectives of the study's participants. 
Finally, Section 5 summarises and concludes the study. 


2. LITERATURE REVIEW 


The impact of BC-enabled DT approach on collaboration is increasingly studied, particularly in the energy, health, 
manufacturing, and transportation sectors. For instance, studies by Huang et al. (2020) demonstrated the 
effectiveness of a data management platform for a turbine using a BC-based DT approach to help curb problems 
associated with data storage, data access, data sharing, data authenticity, and overwritten data. The results of their 
approach indicate the potential of BC-based DT to guarantee data storage, access to verified data, sharing 
efficiency via a peer-to-peer network, and data authenticity through traceability. In another study, EL Azzaoui et 
al. (2021) created a BC-based DT framework for a smart health city to ensure user identity is secure and data is 
anonymously available only to healthcare providers and professionals for real-time data analytics, research, and 
personalised treatment. The framework has the potential to enhance treatment accuracy, predict future diseases, 
and control them. The authors also suggest that in a COVID-19 scenario, this framework could be used as a 
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collaborative approach to provide a secure, shared database where healthcare providers and professionals can 
access data anonymously and use that data to improve treatment and prevent future pandemics. In a recent study 
by Tao et al. (2022), a BC-based platform was proposed for better management of manufacturing services. The 
platform aims to improve trust among collaborators by ensuring accurate information sources and secure data. The 
authors explain that despite the challenges posed by the lack of interaction between physical and cyber spaces, the 
proposed platform offers a reliable solution for cyber-physical integration and addresses issues of distrust 
commonly associated with such service platforms. 


In AEC-FM literature, Lee et al. (2021) developed and tested a BC-DT integrated framework to support 
accountable sharing of project-related information among stakeholders. According to the authors, integrating BC 
and DT can ensure authentic real-time construction-related data is traceable, immutable, and shared among project 
participants without an intermediary. Similarly, studies by Teisserenc and Sepasgozar (2022) posit that BC- 
enabled DT will allow real-time monitoring of assets whilst supporting data decentralisation and enhancing data 
traceability, security, and privacy. In another study, Jiang et al. (2022) suggested that a BC-enabled DT 
collaboration platform can establish a virtual space for real-time monitoring, decision-making, and communication 
among different parties in a project. 


These studies demonstrate that integrating DT and BC can facilitate collaborative work practices by improving 
stakeholder communication, maintaining secure data integrity, and creating an authentic digital representation of 
the physical asset or process being twinned. In addition, these studies have management implications that can be 
useful for monitoring data in real time, exchanging data and information securely, and making trustworthy and 
transparent decisions. However, despite these studies, only a few have attempted to explore and understand its 
applicability in the context of the AEC-FM sector. Furthermore, there is a need to test these theoretical promises 
through empirical research, using both qualitative and quantitative methods to discover robust knowledge. This 
will help gain a more thorough understanding of their relevance within the AEC-FM industry and increase 
awareness about their potential benefits. Hence, seeking input from key stakeholders can offer valuable insights 
into the advantages of implementing a collaborative approach incorporating DT and BC technologies and the 
driving factors behind its adoption. 


3. METHODOLOGY 


A qualitative methodological approach focusing on conducting semi-structured interviews was mainly employed 
for this study. This approach was adopted due to its potential to gather comprehensive data and generate rich 
insights (Bryman, 2016). Hence, using this approach to explore and understand stakeholder perceptions about DT 
and BC technologies for the AEC-FM industry will enable participants to provide rich perspectives based on their 
knowledge and expertise. Through purposeful sampling, participants were sought through online networking 
websites (e.g., LinkedIn) and peer recommendations (Bryman, 2016). The main areas of interest were those 
participants with knowledge or experiences about DT, BC, or both in the AEC-FM industry. Nevertheless, 
flexibility was employed for some profiles which didn’t express explicit expertise in those fields but were keen to 
participate in the study because of their broader knowledge of digital transformation in the AEC-FM sector. 


3.1 Data Collection — Semi-structured Interviews 


After ethical approval was granted to engage participants for data collection, potential candidates were approached 
for their consent to participate in this study. According to Saunders et al. (2019), a range of 5-25 participants is 
adequate for qualitative interviews. Thus, this study included 19 AEC-FM industry professionals and scholars 
from Asia, Europe, North America, and Africa. 


The researchers employed a semi-structured interview protocol to organise detailed and orderly interviews, 
including open-ended questions to collect meaningful comments from respondents (Yin & Campbell, 2018). The 
interviews were conducted via web meetings on MS Teams to accommodate a wider reach of participants and 
allow effective capturing of responses for transcription. Accordingly, before each interview session, participants 
were briefed on the study’s aims, any concerns were addressed, and their consent was formally sought to officially 
commence the interview process. 


3.2 Data Analysis — Thematic Analysis 


With the aid of NVivo, the transcribed responses from participants were progressed into the data processing and 
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analysis, where available information within each response sheet was extracted into self-describing categories and 
held under thematic codes to identify the scope or variety of relevant constructs (Saldaña, 2021). Emerging themes 
highlighting the potential role of BC-enabled DT perceived by participants and the drivers or influential factors 
towards its adoption were identified. The emerging themes identified were guided by the study’s focus, as 
suggested by Bryman (2016). 


4. RESULTS AND DISCUSSION 


4.1 Demography of Interviewees 


A total of 19 participants hailing from Africa, Asia, Europe, and the Middle East were involved in this study. These 
participants were classified according to their field of expertise, which included Academics, Managers, Architects, 
Engineers, Programmers, and Surveyors. 


4.2 Participants' Perspectives on the Role of Blockchain-enabled Digital Twins 


During the interviews, individuals were asked to provide insights on a BC-enabled DT approach's role in 
collaborative working within the AEC-FM industry. Drawing from their extensive knowledge and understanding, 
they affirmed that a BC-enabled DT would be crucial for four main themed functions on effective collaboration, 
as it would provide a secure environment for accessing and sharing data, improve decision-making and promote 
trust and transparency (see Table 1). 


Table 1: Summary of Themes about the Role of Blockchain-enabled Digital Twin from Participant 


No of 

Theme Descriptions Participants 
ENHANCED DATA This theme comprises views on the role of blockchain-enabled digital twin 9 
ACCESSIBILITY in easing stakeholder access to data 
DATA SECURITY This theme comprises views on the features of blockchain-enabled digital 3 

twin in assuring the security of data 
ENHANCED DECISION- This theme highlights views on the role of blockchain-enabled digital 3 
MAKING twins in assisting with decision-making 
TRUST AND This theme comprises views on the relevance of blockchain-enabled 2 
TRANSPARENCY digital twins in ensuring trust and transparency 


Regarding the popularity of themed responses, it is observed from Table 1 that most participants expressed 
sentiments about enhanced data accessibility, followed equally by data security, as well as enhanced decision- 
making, and finally, trust and transparency. Nevertheless, the number of participants is only an indication of 
interest as some participants may be represented across themes, so in this paper, the contents that comprise these 
themes are of interest. The subsequent subsections delve into the themes with supporting evidence quotes. 


4.2.1 Enhanced Data Accessibility 


Most of the interviewees revealed that opting for a BC-enabled DT solution will create a platform where data can 
be accessed easily by project participants and stakeholders. They expect that BC’s added characteristic of a peer- 
to-peer decentralised network will empower each participant to easily access data without the need to depend on 
a central authority for data access. Some interviewees believed that accessing data is vital for collaboration as this 
would allow for timely retrieval of data or information since it is available to all parties. Extracted comments from 
a participant which align with the view of enhanced data accessibility include: 


“I think the timely retrieval of information will be a major aspect that digital twin and blockchain technology 
eventually would be able to address. ” 


“You need to focus on specific information that you want stored on the blockchain, but once you have stored the 
information, then it is available to all the parties involved. So, within the blockchain-enabled digital twin, you 
would have specific information that is useful for managing the overall lifecycle of the building and are 
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available to all parties.” 

The expectation of interviewees is corroborated by Sarfaraz et al. (2023), who revealed that BC poses a potential 
for ensuring the immutability, validity, and confidentiality of recorded data while enabling decentralised storage. 
Hence, BC’s integration with DT can ensure access to information without a mediator, providing a decentralised 
solution to project participants. Additionally, the integration of BC and DT can enhance data visibility. Thus, all 
parties involved in a ‘built asset's lifecycle can access reliable data on a shared distributed ledger (Suhail et al., 
2022). 


4.2.2 Data Security 


Data Security was also indicated as a key relevant attribute that a BC-enabled DT solution can enhance, resulting 
in improved collaboration. Most interviewees consider data security very relevant while managing a built asset’s 
lifecycle. In their opinion, a BC-DT integration will ensure data security. Interviewees further indicated that data 
generated throughout a built asset’s lifecycle needs to be protected due to the confidentiality of certain data types 
shared between project parties. In addition, interviewees highlighted that BC-enabled DT’s incorporation into the 
management of a built asset lifecycle would prevent data from being tampered with due to BC’s immutable nature. 
An extracted interviewee comment is presented below: 


“...but we want to protect this data, and not expose it to any cyber-attacks, or to expose it to people who do not 
have access to that kind of information. So blockchain data, in the end, will create big changes in how we 
manage assets, the way we manage the buildings by providing some good data, accurate data, real-time data 
about the buildings, and that is also secured.” 


The views expressed on BC-enabled DT’s capability in enhancing data security are corroborated by Shen et al. 
(2021), who confirmed via a secure data sharing framework that BC’s distributed mechanism and cryptographic 
features can guarantee a higher level of security as compared with a traditional centralised solution. Sahal et al. 
(2021) revealed that the issue of security in collaborative working could be significantly enhanced by BC-enabled 
DT due to its potential to allow for secure real-time data exchange and analysis among multiple participants. Data 
breaches in real-life scenarios highlight the importance of ensuring data security and reliability, hence the need to 
integrate BC-enabled DT throughout a built asset’s lifecycle. 


4.2.3 Enhanced Decision-Making 


The interviewed participants indicated that the unique characteristics of BC and DT will lead to enhanced decision- 
making. They highlighted that the feature of DT in providing real-time data on a built asset and BC’s capability in 
ensuring secure, immutable data within a decentralised setting can contribute to timely and accurate decision- 
making. Additionally, participants revealed that using the real-time data available in BC-enabled DT platform can 
form a reliable base to improve performance and predict future occurrences. A comment extracted from an 
interviewee is as follows: 


“The strengths of digital twin solutions, with real-time information on the built assets plus the decentralised and 
transparency strengths of the blockchain technology, I believe that will actually help you to manage facilities 
better and also, perhaps, generate more insights that can also help improve how facilities are being managed. ” 


Studies by Suhail et al. (2022) have also indicated that integrating BC and DT can ensure that actionable insights 
are driven by trustworthy data. They further pointed out that BC-enabled DT approach can help stakeholders to 
acquire more thorough and accurate insights into asset performance based on generated reliable data. Sahal et al. 
(2021) highlight that BC-enabled DT collaboration can enhance decision-making in avoiding risks and proffer 
solutions to unexpected occurrences. These studies confirm the views shared by interviewees that incorporating 
BC-DT creates an enabling environment that can facilitate decision-making in the built environment based on 
reliable data. 


4.2.4 Trust and Transparency 


Enhancing trust and transparency between stakeholders involved in a project was also a key perspective shared by 
most interviewees. A participant believed that BC adds a layer of transparency when integrated into a DT 
depending on the defined protocols and its peer-to-peer network. Furthermore, participants were of the view that 
the layer of transparency can contribute to enhancing trust amongst stakeholders. One interviewee stated: 


“... it would be that you can trust it because project parties would have the same records. Parties will not be 
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required to go through old extra records because every time data is updated, it is a universal update throughout 
the network.” 


Findings from the interviews are consistent with studies by Teisserenc and Sepasgozar (2021), who posit that 
incorporating a BC-enabled DT platform is crucial in establishing a reliable and secure data audit trail within 
decentralised ecosystems, from project initiation to completion. This shared data platform guarantees trust and 
transparency while ensuring data integrity throughout the project lifecycle. Suhail et al. (2022) suggest that by 
integrating BC and DTs, all parties involved in a product's lifecycle can effectively and efficiently manage data on 
a shared distributed ledger, thereby addressing data trust, integrity, and security concerns. In summary, a BC- 
enabled DT system will enhance stakeholder collaboration, improve transparency, and resolve trust-related 
concerns. 


4.3 Drivers of Blockchain-enabled Digital Twins 


Understanding the motivations behind the adoption of digital technologies is crucial, as these drivers can 
significantly impact behaviour and outcomes (Yang et al., 2021a). This study presents the driving factors, as 
perceived by key stakeholders, that will lead the AEC-FM industry towards undertaking and successfully 
executing BC-enabled DT approach. In this context, “drivers” refer to the motivating forces (Opoku et al., 2022) 
that will prompt stakeholders to execute BC-enabled DT projects effectively. After conducting a thematic analysis 
of interview data, five key themes emerged as the main influencing factors that would encourage stakeholders in 
the industry to adopt a BC-enabled DT collaborative approach (see Table 2). 


Table 2: Summary of Themes about the Drivers of Blockchain-enabled Digital Twin from Participants 


No of 

Theme Descriptions Participants 
AFFORDABILITY AND This theme represents views relating to the cost involved in acquiring the 8 
COST-EFFECTIVENESS approach and the potential value to be gained 
STAKEHOLDER This theme comprises views relating to stakeholder understanding and 11 
AWARENESS OF THE perceptions of the benefits 
POTENTIAL BENEFITS 
STANDARDS, PoLicies The theme represents views shared on the need for rules, guidelines, and 6 
AND REGULATIONS policy interventions. 
REAL-LIFE The theme revolves around views shared on the need to showcase practical 6 
IMPLEMENTATION approaches and demonstrate real-life examples 
CLIENT’S DEMANDS This theme encompasses perspectives shared on a client’s role and 4 
AND INTERESTS interests 

gr 


4.3.1 Affordability and Cost-effectiveness 


The participants revealed that stakeholders would be more likely to adopt a BC-enabled DT collaborative approach 
based on its affordability and cost-effectiveness. They mentioned that the affordability of the approach, especially 
for small-scale industry players, is considered the main driving force behind its development and implementation. 
In addition, some participants emphasised that a business's decision to invest in a new solution or innovation is 
primarily influenced by its cost-effectiveness, as businesses constantly strive to make returns on their investments. 
One interviewee’s extracted comment is as follows: 


“The main one is the cost to install a big platform. For a small-scale office, it is another cost. So, it only makes 
sense if the cost return works out. The equity returns on their investment in that platform should justify that 
investment and make sense if they scale up to larger scale projects for greater returns.” 


Considering the novelty of a BC-enabled DT approach, its initial cost and affordability are crucial to its potential 
adoption. Studies by Cheng and Chong (2022) revealed that perceptions about the costs of adopting and 
implementing emerging technologies can significantly influence stakeholders’ adoption decisions. Suhail et al. 
(2022) pointed out that adopting emerging technologies can help reduce operational costs, reinforce productivity, 
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and enhance operational efficiency. However, industry players still face high initial investments despite the 
potential benefits gained from their adoption (Toufaily et al., 2021), which could be an obstacle (Rind et al., 2017). 
Suhail et al. (2022) further suggest that adopting these emerging technologies would require a detailed cost-benefit 
analysis; otherwise, they may pose a significant challenge to an enterprise’s resources. 


4.3.2 Stakeholder Awareness of the Potential Benefits 


In the AEC-FM industry, BC and DT are emerging technologies many stakeholders may not yet be familiar with. 
Hence, to promote their adoption, interviewees believe it is important to spread knowledge about their potential 
benefits. Some interviewees suggest that stakeholders need to be educated and convinced about the added value 
of these technologies before they can be implemented. The interviewees also underscored the need for continuous 
advocacy and education to shift the mindset of industry players, especially in an industry that is slow to adopt new 
technology and where stakeholders may resist new innovations. A typical view shared by interviewees is as 
follows: 


“I believe people need to be convinced that this approach can save them money, make them more efficient, or 
maybe make them more competitive against their peers or give them some competitive advantage. So, they need 
to be convinced that investing in tools or new approaches like this will actually be worth it. In the long run, as a 
business, you don't want to invest your time and resources into something that would actually drain your limited 

resources. So, people need to be convinced and encouraged to adopt such technologies.” 


Perspectives shared by participants are supported by Orji et al. (2020), who indicate that having a comprehensive 
understanding of the advantages of new technological advancements can inspire stakeholders to invest in digital 
innovations. Therefore, it is recommended that stakeholders who are well-informed about these emerging 
technologies should emphasise and create awareness of their benefits, such as their ability to improve productivity 
based on an organisation's resources and traits. 


4.3.3 Standards, Policies and Regulations 


During the interviews, it was discovered that industry players are more likely to adopt new innovations if 
stakeholders or policymakers in the built environment develop and implement standards and procedures to guide 
the adoption and implementation of BC-enabled DT collaborative platforms. The interviewees also highlighted 
that implementing standards, policies, and regulations would create an environment that enables the proper 
implementation of new innovations and encourages their adoption. Additionally, some interviewees suggested that 
the government could spearhead the use of new innovations by establishing and implementing specific project 
requirements that make adoption necessary. One of the interviewees intimated that: 


“So, if the stakeholders or the policymakers in the built environment can come up with standards and 
procedures, and also give policies and regulations which motivate or promote the adoption of this technology, 
then it is imminent that professionals in the built environment would definitely adopt it. ” 


The impact of governmental policies, standards, and guidance on the perceived usefulness and complexity of new 
technologies cannot be underestimated (Cheng & Chong, 2022). Abideen et al. (2022) pointed out that 
governmental regulations and laws can influence innovation adoption and further cited the 2016 UK construction 
strategy as a notable instance of such regulations. Hence, regulatory bodies need to create policies and provide 
support to drive the adoption of BC-enabled DT platform. 


4.3.4 Real-life Implementation 


Participants suggested that in addition to communication and raising awareness about the benefits of BC-enabled 
DT platforms, it is important to demonstrate their real-world applications. Most interviewees agreed that 
developing case studies showcasing the tangible benefits of implementing such platforms would be an effective 
way to convince industry stakeholders of their value. Some interviewees also recommended creating prototypes 
to demonstrate the viability of collaborative working using these platforms. Implementing pilot projects was also 
suggested to illustrate the benefits and encourage adoption. Finally, interviewees emphasised that showing use 
cases would also educate people about the necessary resources and requirements for successful implementation, 
given that these technologies are still emerging. One of the interviewees stated that: 


“The development of case studies that showed real application of this technology. So, we have to show how these 
technologies can be applied, and I think that this is the only way in which we can prove to industry and 
academia the values and the benefits that we can realise using the integration of BC and DT.” 
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The views expressed by interviewees are further supported by Zhang et al. (2023), who suggest that pilot projects 
are a practical approach to tackling technical concerns that stakeholders may have and enhance their appreciation 
of the advantages of implementing emerging technologies. They further pointed out that many stakeholders 
hesitate to embrace new technologies because they lack real-world case studies to prove their effectiveness. Hence, 
the deployment of pilot projects cannot be discounted as a driving force during decision-making in adopting digital 
solutions. 


4.3.5 Client’s Demands and Interests 


Interviewees emphasised the role of the clients or owners as driving forces in adopting innovation. They explained 
that clients have the potential to fuel innovation through their desire to explore novel approaches and optimise 
project execution methods. Additionally, interviewees confirmed that some clients often support new strategies 
that can motivate adoption. However, some interviewees caution that clients without knowledge of new 
technologies or a clear understanding of their needs may demand unrealistic solutions. An interviewee shared the 
view that; 


“So basically, what I'm trying to say is that professionals and clients that would love to see new ways and better 
ways of doing things will easily find this approach very valuable to them.” 


Similar studies on the adoption of digital technologies corroborate the role of the client as an important driver. 
According to Yang et al. (2021b), increasing market demands for digitalised solutions in many industrial 
disciplines have driven firms to adopt digital technologies to meet client demands and requirements and maintain 
client relationships. Abideen et al. (2022) also pointed out that industry professionals are driven by clients’ 
awareness of potential enhancements in the digital built environment, and this usually motivates industry players 
to achieve competence and excellence when adopting and implementing new solutions. 


5. CONCLUSION 


This paper explored the perspectives of industry and academic professionals about the potential roles that BC- 
enabled DT collaborative approach could play in the AEC-FM industry, as well as the influencing drivers that 
would motivate such stakeholders to pursue BC-enabled DT approach to collaborative working. Emerging themes 
from respondents revealed that participants perceived that BC-enabled DT could create an enabling environment 
for enhanced data accessibility, data security, enhanced decision-making, and promote trust and transparency, thus 
making it vital for collaborative work across the AEC-FM industry. Meanwhile, it was discovered that five main 
factors emerged from participants’ responses as motivational drivers to pursue the path of BC-enabled DT; these 
are affordability and cost-effectiveness, stakeholder awareness of the potential benefits, standards, policies, and 
regulations, real-life implementation, and client’s demands and interests. 


Despite BC-enabled DT solutions being explored in other industries for collaborations, the lack of attention to its 
combined potential in the AEC-FM leads to the heart of this study’s novelty, which lies in its focus on empirically 
identifying its anticipated roles and influential drivers. By conducting semi-structured interviews with industry 
and academic practitioners, the findings are expected to resonate with readers who seek to understand key 
considerations that come into play beyond the promised potential within literature. 


A significant limitation of this study was the difficulty in finding participants with knowledge or experience in 
both BC and DT applications to respond to the study's questions. 


Based on the findings of this study, it is recommended that further empirical investigation be conducted to gain a 
holistic understanding of the capabilities and implications of adopting a BC-enabled DT approach to address the 
collaboration challenges inherent in the AEC-FM industry as it gears up towards embracing digital transformations. 
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ABSTRACT: Implementing blockchain benefits various construction management processes, such as securing 
payments, enabling traceable design process, and enhancing information transparency in supply chain. However, 
blockchain implementation in construction is still in its infancy due to weak functionality of smart contracts, which 
are self-enforceable programs allowing iteration between external data and blockchain. In the context of 
construction management that embraces complex and dynamic business processes, smart contracts are currently 
designed based on specific and isolated functional requirements without considering the connection and execution 
logic between these functions. It leads to inefficient collaboration and even execution errors, thereby corrupting 
data quality and even causing business failure. Therefore, this paper proposes a Blockchain-BPMN (Business 
Process Model and Notation) integrated (BBI) framework for construction management. The framework poses 
two contributions. First, a BPMN-driven method is developed to design smart contracts supporting executing 
linked and logically connected business activities. Second, an access control strategy is integrated into smart 
contracts to safeguard the accessibility of sensitive business data in a blockchain environment. The BBI framework 
is validated in an actual BIM design collaboration scenario, and results show its feasibility and computational 
performance are acceptable. Several aspects for improvement and future directions are discussed in the end. 


KEYWORDS: Smart contract; Blockchain; BPMN; Construction business process 


1. INTRODUCTION 


Data security is becoming an increasingly concerning issue in the construction industry. With the higher level of 
digitalization in construction, it is inevitable that data security becomes more important (Garcia de Soto et al., 
2022). Several examples highlight the vulnerabilities that exist in the construction business process, such as the 
lack of records for Building Information Modeling (BIM) changes, fragmented and unaccountable supply chains, 
and insecure payment systems. These vulnerabilities can result in various negative consequences for construction 
projects, from delays and cost overruns to compromised safety and quality. Therefore, it is crucial for the 
construction industry to prioritize data security, traceability, and process automation. By implementing robust 
security measures, ensuring transparency and accountability in the supply chain, and automating processes, 
construction companies can mitigate the risks associated with data breaches and cyberattacks. This not only 
protects sensitive information but also improves overall project efficiency and reduces the likelihood of errors and 
disputes. The construction industry must recognize the importance of data security and take proactive steps to 
safeguard their systems and information. 


Blockchain and smart contracts have emerged as potential solutions to various challenges in different industries, 
including the construction sector. Blockchain is a decentralized digital ledger that records and verifies transactions 
across multiple computers, ensuring transparency, security, and immutability (Leng et al., 2022). On the other hand, 
smart contracts are self-executing contracts executed in the blockchain platforms with the terms of the agreement 
directly written into lines of code (Ye et al., 2022a). These two technologies have the potential to revolutionize the 
way transactions are conducted and contracts are executed. 


In the construction industry, blockchain and smart contracts have been explored as solutions to various problems. 
One example is payment management. Smart contracts can automate the payment process, ensuring that 
contractors and suppliers are paid promptly based on predefined conditions (Sigalov et al., 2021; Ye & König, 
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2021a; Ye et al., 2020). Another example is in the field of BIM change record in the design phrase. By 
implementing smart contracts, any modifications to the design of BIM model can be automatically updated to 
blockchain, ensuring that all stakeholders are aware of the changes and can plan accordingly (Xue & Lu, 2020). 
This can help streamline the design phase and minimize the risk of errors or miscommunication. Supply chain 
management is another aspect of the construction industry that can benefit from blockchain and smart contracts. 
With complex supply chains involving multiple suppliers, contractors, and subcontractors, ensuring transparency 
and traceability becomes crucial. Blockchain can provide a secure and immutable record of every transaction and 
movement of materials, enabling stakeholders to track the origin, quality, and location of construction materials, 
where smart contracts can automate the procurement process, ensuring that materials are ordered and delivered on 
time, and payments are made only after satisfactory delivery (Elghaish et al., 2023). Therefore, blockchain and 
smart contracts have the potential to revolutionize the construction industry by addressing various challenges and 
improving efficiency. As more research and implementation examples emerge, these two technologies are not just 
buzzwords but practical solutions that can drive innovation and transformation in the construction industry. 
However, the current smart contract implementation is isolated, neglecting task sequence execution, leading to 
limited automation. Furthermore, the use of code in smart contracts makes them challenging to understand for 
non-programming participants in the construction industry. 


Therefore, the main objectives of this study are to explore the possibility of incorporating blockchain and smart 
contracts into construction business processes and to improve the automation level using smart contracts in the 
construction field. To achieve these, we propose a novel BPMN-driven approach aimed at enhancing the efficiency 
and automation capabilities of smart contracts. The study aims to develop a Blockchain-BPMN integrated 
framework for connected smart contract execution, enabling seamless interaction and collaboration among 
different smart contract functions, thereby automating complex construction workflows. Additionally, a smart 
contract access control strategy is proposed to manage data accessibility, ensuring that sensitive information is 
only accessible to authorized parties while maintaining transparency. 


2. LITERATURE REVIEW 
2.1 Blockchain and smart contracts in construction 


In the construction industry, many blockchain and smart contract reviews were conducted and their adoption to 
the construction industry were highlighted. For example, the applications of blockchain and smart contracts in 
construction were summarized by Li and Kassem (2021), which mainly focusing on information management, 
payment, procurement and supply chain management. Another smart contract review was conducted, which 
pointed out that 81 smart contract-related papers were published from 2014 to 2021 in the construction industry, 
focusing on the areas of contract and payment, supply chain and logistic, and information management (Ye et al., 
2022a). 


One of the critical pain points in the construction industry is the inefficiency and lack of transparency in payment 
processes. Smart contract-based payment systems have emerged as a promising approach to tackle these issues. 
Ahmadisheykhsarmast and Sonmez (2020) investigated the implementation of smart contracts in construction 
projects to enhance security in payment processes. Their study demonstrated that smart contracts could automate 
payment release based on predefined conditions, reducing payment delays and disputes significantly. Building 
Information Modeling (BIM) is a critical component of the construction process, allowing stakeholders to 
collaborate visually and efficiently during the design phase. Integrating blockchain technology with BIM has the 
potential to enhance data management and collaboration (Ye et al., 2022b). Tao et al. (2021) explored the use of 
blockchain and smart contracts for BIM-based collaborative design in construction projects. Their research 
demonstrated that the decentralized and immutable nature of blockchain improves data sharing and coordination 
among stakeholders during the BIM design phase. 


The current use of smart contracts faces limitations, including their isolated nature and the absence of a coherent 
approach to developing logical methods that align with construction business processes. In Fig. 1, a comparison 
highlights the difference between the current practice and the potential capabilities of smart contracts in 
construction workflows. Presently, various tasks are handled by independent smart contracts, necessitating users 
to determine which contract to engage. For instance, in Fig. 1 (a), tasks are individually managed by separate smart 
contracts without intrinsic connections, which may cause execution or business logic errors and decrease the 
automation level of smart contracts. The current practice contrasts with practical expectations where inter- 
dependencies should be embedded. Fig. 1(b) illustrates the expectation for execution tasks within a smart contract, 
which enhancing the automation potential of smart contracts. 
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(a) Current ! (b) Expect 


Task2 Task3 


1. Execution/business logic 
errors; 2. Low automation level 


Fig. 1: The limitation of the current usage of smart contracts. (a) Current practice of smart contracts; (b) 
Expected usage of smart contract 


2.2 Combination with BPMN for construction business process management 


Business Process Model and Notation (BPMN) is a standardized graphical representation language used to define 
and visualize business processes (Object Management Group, 2013). In the realm of construction project 
management, BPMN offers a range of benefits. It provides a clear and intuitive visualization of complex 
construction workflows, enabling stakeholders to understand processes more easily (Borrmann et al., 2018). This 
enhances communication, collaboration, and decision-making among project teams. Furthermore, BPMN can 
facilitate the identification of bottlenecks, inefficiencies, and potential improvements in construction processes, 
contributing to more effective project management and resource allocation (Borrmann et al., 2018). 


BPMN’s suitability for connecting smart contracts is noteworthy. With its versatile graphical representation, 
BPMN can model and illustrate the interactions between different smart contract functions effectively. This enables 
the design of intricate sequences of smart contract executions, facilitating the automation of multi-step processes 
(Ye & König, 2021b). For example, Lopez-Pintado et al. (2019; 2022) introduced and implemented a blockchain- 
based BPMN execution engine called Caterpillar to generate smart contracts using a BPMN-to-solidity compiler. 
Di Ciccio et al. (2019) proposed an approach to translate from BPMN process models to smart contracts using 
Caterpillar tool, execute processes through smart contracts, and track activities in the Ethereum blockchain. The 
other existing study in the construction industry analyzed the possibilities of combining blockchain and BPMN 
choreographies, and proposed a framework to identify the state of each process in BPMN processes and 
choreographies (Spalazzi et al., 2021). Additionally, BPMN can be coupled with access control mechanisms, 
ensuring that only authorized parties can engage with specific smart contracts or process steps. This integration 
strengthens security and transparency, both crucial aspects in the construction domain. 


However, the current research focuses on direct translation from BPMN to smart contracts, without considering 
the real-time connection and visual execution of BPMN and smart contracts. One reason is the relative novelty of 
blockchain and smart contract adoption within business processes. As a result, the exploration of BPMN’s 
capabilities for this purpose is limited. Moreover, incorporating access control mechanisms in this context presents 
challenges, as construction projects involve various stakeholders with differing levels of authorization. Balancing 
data visibility and access becomes intricate, given the diversity of involved parties and the need to maintain data 
integrity and security. 


This paper addresses these gaps by presenting two significant contributions. Firstly, it introduces a groundbreaking 
Blockchain-BPMN integrated framework designed for the connected execution of smart contracts. This framework 
allows for the seamless interaction and execution of interconnected smart contracts, enabling the automation of 
complex construction workflows. Secondly, the paper proposes an innovative smart contract access control 
strategy that caters to the diverse data visibility needs of various stakeholders in the construction process. These 
contributions collectively aim to revolutionize the construction industry's approach to business process automation, 
improving efficiency, transparency, and collaboration through the integration of BPMN and blockchain-powered 
smart contracts. 
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3. METHODOLOGY 


The research methodology, illustrated in Fig. 2, comprises three key steps. Step 1 involves designing a Blockchain- 
BPMN integrated (BBI) framework by firstly defining the framework’s logic and workflow, and then explaining 
how the connected smart contracts are executed in the framework. Step 2 focuses on developing a strategy to 
control data access. This includes determining who can see which data level based on permissions and setting rules 
for accessing data stored both inside and outside of the blockchain. In Step 3, the logical smart contract algorithm 
is developed, by defining the mapping from BPMN to smart contracts, and developing the smart contracts based 
on the BPMN execution logic. The output of Step 1 is a concept explanation of the BBI framework, and its 
functionalities are detailed further by the development of the data access control strategy in Step 2 and the logical 
smart contract algorithm in Step 3. Step 4 tests the feasibility of the BBI framework using a case study. This 
includes building and putting the BBI framework into action, and then assessing its effectiveness using a BIM- 
based design collaboration scenario. 


STEPS METHODS OUTPUTS 


| Step 1 H Designing a BBI framework integrating blockchain and BPMN } 


f : | Explain mechanism of BBI 
eee and workflow Ean] for connected smart Concept of BBI framework 
9, . contract execution is explained 
L puu a 


Step 2 } Developing an access control strategy for task execution 


‘ai Set access control rules for the ’ ‘ 
Define data visible levels ; [An access control strategy 
based on user permissions gata OES ee andion ->| is developed to control 
isa SE chain data visibility in BBI 


: 5 


A logical smart contract is 


| Step3 Developing the logical smart contract algorithm 
Define the mapping from Develop the smart contract ` developed to improve the 
BPMN to smart contracts P j 
logic ‘ 
` execution : 


based on the BPMN execution 
automation of task 
Step4. +—> Testing the ability of BBI framework using a case study | 


Implement the BBI Use BIM-based design 
framework integrating »| collaboration case to test the The feasibility of the BBI 
blockchain and BPMN framework framework is validated 


Fig. 2: Research methodology 


3.1 Blockchain-BPMN integrated framework 


The overview of the proposed BBI framework is shown as Fig. 3. Process-embedded smart contract list (PeSCL) 
is proposed in this paper for storing all the process-required data in a construction project to be executed via the 
smart contracts, which includes such as project information, participant information, process units, and the linkage 
to the BIM model. The PeSCL and the BIM model are the inputs for the framework (Fig. 3@)), where BIM model 
is displayed in a BIM viewer, and PeSCL is displayed as table. They are linked via the GUID of each BIM elements 
that stored in the PeSCL (Fig. 3(2)). These two input data are further linked with the construction business process, 
which is represented by BPMN (Fig. 3@)). Each BPMN task is mapped to a corresponding smart contract function. 
Real-time process visualization and execution are then realized (Fig. 3(4)). During the task execution, an access 
control strategy is developed for controlling different data permissions for different participants of the construction 
project. Such strategy controls the visible level of both on-chain and off-chain data (Fig. 3(5)). The logical smart 
contracts automatically execute the tasks represented via BPMN with their execution logic, and their transactions 
are stored in the blockchain (Fig. 3@)). Such framework can not only visualize BIM and real-time construction 
business process execution status with an improvement of business process automation, but also build trust and 
realize data access control. 
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Fig. 3: The overview of the proposed Blockchain-BPMN Integrated (BBI) framework 


3.2 On-chain and off-chain access control strategy 


The access control strategy can be divided into two parts, namely on-chain and off-chain data access permission 
(see Fig. 4). All the project-related documents are stored in the off-chain storage, where the permission level of 
each document is also stored. When users request to check or modify the off-chain data, their username (as 
identifier) and permission level will be firstly checked. All the sensitive or valuable data are stored on-chain (i.e., 
on the blockchain), and only specific user can call specific smart contract functions. When users request to view 
or modify the blockchain (BC) data, their BC account identifier will be firstly checked in their called smart contract 
function for viewing or modifying request. Further detailed example about the access control strategy is shown in 
the case study section (Section 4). 


User (Frontend) 
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Access Ni Login account 


permitted „| | Return data based on user role, 
data 
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IEN 
Display corresponding results 
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itted data 
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Fig. 4: Access control strategy among users, smart contracts, blockchain, and off-chain storage 
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3.3 From BPMN to logical smart contract 


To solve the problem of current isolated smart contract usage, this paper proposes a way of implementing smart 
contracts, which is defined as logical smart contracts. Such logical smart contract, which is interpreted based on 
the BPMN components, not only automates the BPMN tasks based on the execution logic in BPMN, but also 
further automates the inner actions of some specific BPMN tasks. The mapping from BPMN to the logical smart 
contract is shown as Fig. 5. In BPMN, there are three main components, namely participants, tasks, and flows. 
BPMN participants are interpreted into P.State Variables and P.Modifiers, where the former is used to store the 
information of the participants (such as identifier and role) and the latter is used to restrict that only specific 
participant can execute specific smart contract function (for access control strategy). BPMN tasks include task 
participants and inner actions, where the former is from the BPMN participants with the specific task execution 
permission and the latter is to indicate what actions could be done within the tasks. BPMN flows are used to 
indicate the execution logic of BPMN tasks, which is further interpreted into F.State Variable, Enum, F.Modifiers 
and F.Functions. Further example of the generated logical smart contract is shown in the case study. 
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Fig. 5: Mapping from the logic of BPMN to the logical smart contract 


4. CASE STUDY 


Validation of the proposed BBI framework was achieved through the implementation of a decentralized application, 
where a React frontend was seamlessly integrated with the Ethereum blockchain. This implementation was 
rigorously tested against a real-world BIM design collaboration case from Tao et al. (2021). In Fig. 6, the 
comparative analysis between their original smart contract solution and our enhanced logical smart contract is 
presented, revealing the improvements achieved by integrating process logic into the smart contract. Following 
this comparative analysis, the testing results of the case within our implementation are diligently presented in Fig. 
7, providing a comprehensive view of the real-world applicability and efficacy of our proposed framework. 


In Fig. 6(a), the BPMN workflow was meticulously designed based on the case elucidated by Tao et al., 
encompassing three distinct roles of participants and a comprehensive set of seven BPMN tasks. Fig. 6(b) depicts 
the original smart contract solution, which was rather limited, accommodating only two smart contract functions, 
namely UPLOAD and INQUIRE. In this solution, real-time process status updates were unavailable to participants, 
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and the execution of smart contract functions required manual intervention from the participants themselves, 
lacking the seamless automation sought after. Conversely, in Fig. 6(c), our proposed logical smart contract solution 
represents a significant advancement. Execution logic and access permission are ingeniously incorporated directly 
into the smart contract, improving its ability for automation and collaboration. Furthermore, execution permissions 
are elegantly restricted by this design, significantly mitigating the potential for errors and associated challenges, 
thus providing a more robust and efficient solution for BIM design collaboration in the smart contract system. 


Fig. 7 serves as a comprehensive visual representation of the final case outcomes, encapsulating a multitude of 
crucial components that harmoniously contribute to the success of our BBI framework. Within this illustrative 
figure, Fig.7(a) provides an immersive real-time visualization of the BPMN process status, meticulously denoting 
task execution through distinct markers, with yellow indicating completed tasks and green designating those 
currently in progress. This dynamic display offers stakeholders an intuitive and up-to-date overview of project 
progression. In Fig.7(b), the logic of BPMN tasks is revealed through the logical smart contract functions, 
showcasing how these functions govern the execution of critical project activities. 


Meanwhile, Fig.7(c) introduces the access control strategy with strict user authentication via blockchain login 
accounts. This authentication process ensures varying levels of data visibility across the frontend, catering to the 
unique needs and privileges of individual users. Furthermore, it plays a pivotal role in safeguarding sensitive smart 
contract functions, allowing access only to authorized stakeholders. The BIM model is displayed in Fig.7(d) 
through the BIM viewer, providing stakeholders with a comprehensive visual representation of the project’s 
architectural intricacies. This model is intimately linked to the PeSCL shown in Fig.7(e) , which is used for storing 
and listing project-specific details. Finally, Fig.7(f) focuses on on-chain data and blockchain-related information, 
demonstrating the transparent and immutable nature of data stored in the blockchain. Together, these components 
provide project participants with valuable insights into the structural and organizational dimensions of the project, 
offering a holistic view of how blockchain-based smart contracts enhance transparency and data integrity in the 
BIM design collaboration case. 


BIM Design Collaboration Case 


(a) BPMN workflow 
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Fig. 7: Results of the BIM design case in the BBI framework 


5. CONCLUSION 


In summary, the escalating concern for data security in construction business processes has spurred the search for 
innovative remedies. While the potential of Blockchain and smart contracts is evident, they grapple with substantial 
drawbacks, notably the isolation of current smart contract implementations and the difficulty non-programming 
participants face in comprehending smart contract codes within the construction industry. This study has 
confronted these challenges head-on by pioneering a BPMN-driven approach, seamlessly integrating blockchain 
and smart contracts while remedying issues related to automation and participant understanding. The introduction 
of the Blockchain-BPMN integrated (BBI) framework marks a significant advancement, revolutionizing the 
automation of intricate construction workflows, fostering seamless collaboration, and enhancing visualization 
across diverse smart contract functions. The strategic inclusion of an access control strategy fortifies data security 
while preserving transparency. The innovation of logical smart contracts, enhancing automation by embedding 
task execution flows, is a notable contribution. The feasibility and efficacy of the BBI framework are demonstrated 
through practical implementation and rigorous testing in a BIM design collaboration scenario. However, a 
lingering limitation pertains to the immutability of smart contracts. Future endeavors should focus on enriching 
the framework’s adaptability and addressing this issue by incorporating an upgradable feature into the proposed 
logical smart contracts, paving the way for more refined solutions to the multifaceted challenges inherent in the 
construction field. 
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ABSTRACT: A crucial action of COVID-19 combat is the quick design and building of makeshift hospitals (MHs). 
Although adopting building information modeling (BIM) promotes the digitalization and communication of design 
collaboration, data security vulnerabilities (e.g., lacking traceability and transparency) are detected and have 
inevitably impeded the efficiency and productivity of the MH project. Such problems often lead to rework and 
unnecessary disputes, wasting valuable time on projects requiring ultra-fast construction speed. The emerging 
blockchain technology offers an immutable and traceable collaboration environment. However, limited studies 
have integrated blockchain in the BIM design process, especially design in emergency projects like MH. Therefore, 
this paper proposes a blockchain-enabled collaboration (BEC) framework for fast and secure BIM design. The 
framework is illustrated in an actual MH project in Hong Kong, and results show that: (1) it supports secure and 
automated BIM data exchange and (2) it saves 23 % of the time in a design coordination case. 


KEYWORDS: Covid-19, BIM, Blockchain, Makeshift Hospital, Smart Contract. 


1. INTRODUCTION 


The COVID-19 pandemic seriously threatens the global public medical and health system. By January 2023, there 
had been over 750 million confirmed illnesses and over 6 million documented fatalities (WHO, 2023). 
Governments launched many emergency construction projects to meet the enormous demand for patient quarantine 
facilities, establishing multiple makeshift hospitals (MHs) (Luo et al., 2020). A makeshift hospital, also referred 
to as Fangcang hospital or mobile cabin hospital, is a type of emergency service that assembles modular medical 
units to form a temporary hospital to address the ongoing scarcity of medical supplies. Two examples in China, 
Huoshenshan hospital and Leishenshan hospitals, built and delivered in 10 days and 18 days, respectively, have 
proved that the suppression of epidemic outbreaks is made possible by establishing temporary MHs, which are 
crucial for lifesaving and improving recovery rates (Zhou et al., 2022). 


MHs have challenging tasks, intricate specifications, and condensed design and build times. Therefore, building 
information modeling (BIM) technology is introduced because these criteria present additional difficulties for 
design work (Tan ey al., 2021). BIM is essential in streamlining coordination by combining organizational and 
procedural data into a shared repository. According to (Luo et al., 2020), BIM benefits the MH design in three 
aspects. First, MH design needs significant optimization work by reducing changes during the building phase by 
visualizing all conflicts. Additionally, designers can use integrated BIM data to speed up construction for parallel 
scheduling, improving design simulation, like predicting the transmission of viruses in negative pressure wards. 
Finally, BIM is a platform everyone can use to easily create, exchange, and collect digital data to promote 
collaboration. 


However, some data security concerns still exist in the MH BIM process, causing reworks and time waste, 
hindering efficiency and productivity. For example, in contrast to general construction projects in which BIM 
models from various disciplines are created sequentially, models in MHs are made independently and concurrently. 
When BIM information is synchronized amongst teams, there is a danger of data omission and inconsistency, 
which eventually leads to design errors because there are no referenced models for the untransparent collaborative 
environment. Besides, there are no routine meetings for BIM coordination due to time constraints. Instead, a "peer- 
to-peer (P2P)" communication model is used, wherein project members contact accountable designers directly and 
personally through virtual meetings or phone calls. However, because it lacks verifiable and traceable records, this 
method can occasionally be disorganized and ineffectual, making collaboration time-consuming. 


Blockchain is a distributed database technology that leverages peer-to-peer networks, cryptographic hash 
algorithms, and consensus mechanisms to secure data integrity (Tao et al., 2020). The blockchain prevents a single 
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administrator from controlling the entire network because each peer keeps an identical and append-only data copy 
(also known as the blockchain ledger). The benefits of blockchain in construction payment (Das et al., 2020), 
supply chain management (Tezel et al., 2021), and design collaboration (Pradeep et al., 2021) have been 
preliminarily investigated. However, incorporating blockchain into MHs is still in its infancy for two barriers: (1) 
The mechanism of embedding blockchain in an MH BIM process is unclear. Problems like which activities need 
blockchain and what data should be transferred are still awaiting solutions. (2) Key technical elements, like smart 
contracts supporting blockchain data interaction, have not been developed. 


Therefore, this paper proposes a blockchain-enabled collaboration (BEC) framework for fast and secure BIM 
design. The primary scientific contributions of this paper are: (1) identified three main security risks in the MH 
BIM design process, (2) proposed a BEC framework to alleviate harms brought by the security issue, thus 
improving design efficiency, and (3) developed three smart contracts to automate the data exchange in a blockchain. 
Finally, the framework is demonstrated and evaluated in an actual MH project in Hong Kong. 


2. METHODOLOGY 
2.1 Identification of BIM security vulnerabilities in MH 


Locating the risks is the fundamental step in developing the BEC framework. In this study, one BIM manager and 
two BIM designers who had a direct hand in an MH project and were familiar with the intricate steps of BIM- 
based design were interviewed four times each. Data collection took place over one month in 2022, from March 
to April. A BIM collaborative design workflow is in Figure 1. 
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Fig. 1: BIM-based design workflow in MH 


Three factors account for how the workflow quickens the BIM design process. The parallel design is the first 
benefit. Several models are produced simultaneously in this MCH project, reducing time compared to normal 
construction projects in which BIM models from several disciplines are developed sequentially. P2P 
communication is the second advantage. There are no scheduled meetings for BIM coordination because of 
scheduling constraints. Instead, a P2P communication model is used, in which accountable designers are contacted 
directly and personally by project members during virtual meetings or phone calls when design issues arise. This 
is an effective method for facilitating communication since it prevents pointless meetings and enables responsible 
individuals to identify design concerns promptly. Pre-revision refers to addressing design or BIM model issues 
before the client issues formally amended drawings or documentation, saving waiting time. Content marked green 
in Figure 1 indicates the steps that can be improved by blockchain. 


However, security flaws are also caused by such unique collaboration methods. The risks and merits provided by 
blockchain are presented in Figure 2. For parallel work, the absence of a visible path for updating information 
causes differences across models, mistakes in design, and even delays. Blockchain guarantees data transparency 
and consistency thanks to its distributed architecture. In the course of the discussion, the BIM manager bemoaned 
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the fact that they frequently wasted time looking up design records to identify problems because ECPs lacked 
adequate logging facilities, leaving out certain previous data. Additionally, several designers collaborate to develop 
a model in a "relay race" fashion, with each team member finishing a section of the model in turn. The designer 
contacted during the P2P collaboration may not be aware of the specifics of the issue if they recently took over the 
contract from another designer (the issue proposer). All design logs and actions can be immutably recorded using 
blockchain to improve collaboration traceability. 


Security vulnerabilities Blockchain merits 


Parallel design 


Each model is built independently without 
referenced models, leading to discrepancies 
among BIM data in different domains 


No transparent environment to synchronize 
updated information within or cross domain, 
resulting in design error, rework and even delay 


Peer-to-peer communication 


Designers contact accountable people directly. 


Lacking traceable design records often lowers 
the efficient of communication and design work 


Pre-revision mechanism 


Automation 


Unforgeability 


Fig. 2: Blockchain merits in MH 
2.2 Blockchain-enabled collaboration (BEC) framework 


Pre-revision requires permissions from multiple 
parties. Physical signing is time-consuming, and 
it is hare to verify signature authenticity 


The BEC framework in Figure 3 shows the logic of implementing blockchain in an MH BIM design and offers a 
technical stack for data flow. The seven-layer architecture serves as an example of the technical stack used to create 
a BEC framework. Through the application layer, which provides an interface for user management, users in the 
user layer can sign up for the blockchain. User operations may be documented in the blockchain in this manner. 
These "one-stop" registration processes make it easier for consumers to use blockchains because they simply need 
to complete standard registration processes. The interaction between the front-end inputs and the blockchain 
databases on the backend is carried out through smart contracts at the smart contract layer. Blockchain smart 
contracts, machine-readable pieces of code, can self-execute business logic (e.g., payment) on blockchain ledgers, 
thus empowering automatic and unforgeable collaboration actions (Li et al., 2022). Event transactions will 
represent collaboration actions, such as permission transactions (metadata of permission execution), issue event 
transactions, and model event transactions (metadata of events pertaining to BIM models). Large-sized data like 
BIM models and non-graphic documents are also stored in off-chain databases, which can be local servers or cloud 
servers like Autodesk BIM 360. This is in addition to on-chain data (transactions). This layer has a digital signature 
generator that generates digital signatures using the 256-bit secure hash technique (SHA-256). These fingerprints 
are spread across the blockchain network. The infrastructure layer serves as the foundation and provides necessary 
services, including cloud data storage, operating systems, and hardware (e.g., GPU and CPU). 
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Fig. 3: Architecture of BEC framework 
2.3 Smart contracts for BIM data interaction 


In the BEC framework, three smart contracts have been coded. The pseudocodes are shown in Figure 4. A 
transaction comprising the model ID, fingerprint, user ID, and timestamp will be transmitted to the IS (information 
sharing) smart contract whenever a user uploads a new model (or issue). An endorsing node will verify this 
transaction by examining the legality of the transaction (Step 1). Illegal transactions will be denied, or they can go 
through and be sent back to the smart contract with endorsing signatures. An example of an illegal transaction 
would be one where the transaction data model is not required to follow a key-value structure (Step 2). In Step 3, 
the smart contract hands the transaction off to an ordering service that will chronologically group the transactions 
that occurred within the specified time into a new block. All blockchain nodes will receive it through the smart 
contract, and upon obtaining it, they will verify the block (Steps 4 and 5). The ledger will be updated with 
confirmed blocks as immutable records. 


The rationale behind PE (permission execution) smart contracts is similar. One distinction is the smart contract 
input, and the other is the status results. After reviewing an issue, a project member (such as a domain manager) 
can decide. An endorsing peer will receive a new transaction from the PE smart contract along with all essential 
metadata (such as issue ID, issue decision, reviewer comments, etc.) so they can verify that the input complies 
with the regulated data model. Then, the project member will update the blockchain world state, a database that 
displays the most current status of various files and is built on top of the blockchain ledger. Besides, the PE smart 
contract creates and stores the digital signatures of the user to ensure the integrity of outcomes, saving time by 
doing away with the laborious physical permit process. The HQ (history query) smart contract will retrieve 
previous transactions (e.g., file ID) based on the provided key. 


321 


Pseudocode of information sharing (IS) smart contract 
Input: IS transaction data, IS function 
Output: New block in blockchain 
Step 1: IS function proposes the transaction for legality checking 
if input data model = pre-defined IS data model 
Function name = IS 
generate endorser signatures 
else 


validation fails 
Step 2: Get back endorsing signature 
return pre-execution results € transaction data||read-set||write-set||signatures 
Step 3: Sent to ordering service for new block packing 
12 new block € pre-execution results||block hashes||Merkle tree||timestamp 
13 Step 4: Broadcast the new block to all nodes 
14 Step 5: Verify and add to ledger 
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15 if validation results = true 

16 return new block € block number “N+1” 

17 notification message € “Transaction has been successfully shared” 
18 else 

19 IS transaction sharing fails 

20 Step 6: Notification 

21 End 


Pseudocode of permission execution (PE) smart contract 


1 Input: PE transaction data, decision, PE function 

2 Output: New block in blockchain, issue status change 

3 Step 1: PE function proposes the transaction for legality checking 

4 if input data model = pre-defined PE data model 

5 Function name = PE 

6 generate endorser signatures 

T else 

8 validation fails 

9 Step 2: Get back endorsing signature 

10 return pre-execution results € transaction data||read-set||write-set||signatures 
11 Step 3: Sent to ordering service for new block packing 

12 new block € pre-execution results||block hashes||Merkle tree||timestamp 


13 Step 4: Broadcast the new block to all nodes 
14 Step 5: Verify and add to ledger 


15 if validation results = true 

16 return new block € block number “N+1” 

17 notification message € “Transaction has been successfully shared” 
18 New issue status € reviewer ID||decision 

19 else 

20 PE transaction sharing fails 

21 Step 6: Notification 

22 End 


Pseudocode of history query (HQ) smart contract 

1 Input: Target file ID, HQ function 

2 Output: Transactions details of a given ID 

3 Step 1: HQ function checks if the given ID exists in blockchain 
4 if ID exists 

5 query results € transaction value of this given ID 

6 

7 

8 


else 
reject the query 
End 


Fig. 3: Architecture of BEC framework 


3. VALIDATION AND EVALUATION 
3.1 Validation scenario 


The roadmap for the validation scenario, which aims to confirm the viability of the smart contract and the 
authorization workflow, is shown in Figure 5. The AR-01 presents a concern regarding the current window size of 
a medical block, which is 600*1200mm, although the standard suggests a 900*1200mm window. The problem is 
then generated, along with a design document, for review. The domain team leader AR-O reviews the issue detail 
and approves it with the remark that "this is an issue." The technical engineer (TE-1) notes that this problem cannot 
be resolved internally and requires the client's additional input. The quality manager (QM-1) approves this issue 
because it relates to requirements that should be accepted by the client and drawing team. All these permissions 
are carried out by calling the PE smart contract. Results indicate that every authorization action was carried out 
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automatically and stored in the blockchain. Each transaction records the following information: (1) the user ID 
executing the action, (2) the judgments (in this case, "approved") made by various users, and (3) the precise 
remarks made by various parties. 


“AR-O1” raised an issue that the window size of a 
medical module docs not comply with standards — — =O sis 


Current window size is 600°) 200mm 
However, according to the Guide for 
mobile cabin hospital design and 


construction, a 900" 1 200mm window is 
suggested.” 


“Status: Approved SRS : y 
by domain leader “Status: Approved by Status: Approved by 
technical engineer’ quality manager™ 


User ID: AR-0” “User ID: TE-1 User ID: QM-1° 


Fig. 5: Framework validation in a BIM permission execution scenario 
3.2 Framework evaluation 


The evaluation includes two parts. The first is BEC computing performance testing, aiming to see if the proposed 
framework meets the basic data processing requirements. High latency, like what is in the Bitcoin blockchain, is 
not acceptable in construction design. The second is a comparison evaluation to test if the BEC framework 
advances existing BIM solutions regarding design efficiency. 


Blockchain latency measures the time required for a smart contract to register a transaction in the blockchain 
(Ciotta et al., 2021). To determine whether a blockchain's "speed" meets business needs, latency is a crucial statistic 
in the evaluation process. In MHs where rapid information sharing is required, high latency is problematic. The 
latency of three smart contracts is measured in this research. To prevent abnormal findings, the repeated test is run 
ten times. The results in Figure 6 demonstrate that all latency is millisecond-level. No latency standard has been 
established in construction, because the blockchain is currently being built. Consumers won't notice the delay if 
the blockchain application's maximum reaction time is less than 1 second, according to Fatokun (2021). 
Recommendations and best practices from studies (Sheng et al., 2020; Tao et al., 2021) state that millisecond-level 
blockchain latency is acceptable for design and construction procedures. The BEC framework latency is, therefore, 
permissible. 


The quantitative analysis of how automated smart contracts in the BEC framework could promote efficiency is 
shown in Figure 7. Three project participants from the MH project—one BIM manager and two designers—were 
requested to conduct a BIM design coordination case in both the current BIM 360 platform and the prototype based 
on the BEC framework. Because the BEC framework employs a workflow like current practice, the time 
investment in the initial model change is the same (30 mins). The model adjustment required the design team to 
review previous data. The retrieval of a CAD drawing's change history took 11 minutes due to inadequate 
traceability in traditional solutions. The BIM team only spent 3 minutes identifying all BEC framework traceable 
blockchain ledger records thanks to the HQ smart contract. Both solutions required P2P collaboration and took 8 
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minutes. The team leader had to sign a design sheet before it could be pre-revised. The current method required 
10 minutes to obtain a physical signature (including the waiting time). In contrast, calling the PE smart contract in 
BEC took 6 minutes. The BIM team also had to do extra work since they missed a supplier update because they 
were working on a different system; as a result, they had to spend an additional 10 minutes using existing solutions 
to fix the problem. Finally, by utilizing BEC and smart contracts (72 min in total), 23% of the time was reduced 
compared to the current solution (94 mins in total) 
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4. CONCLUSION 


BIM-based design in MHs risks data security issues that will impair job productivity due to stringent time 
constraints and complex design processes. Therefore, this study proposes a BEC framework to increase the design 
efficiency of ECPs while lowering the obstacles to accessing blockchain benefits. Three objectives have been 
achieved. Firstly, through interviewing BIM participants from an MH project, three risks, namely, lacking 
transparency, lacking traceability, and lacking permission automation, that will harm the design process are 
identified. Secondly, a BEC framework is presented to show the technical architecture to guide the data exchange 
and BEC prototype development. Third, three smart contracts are developed for secure information sharing, 
permission execution, and historical data query. Due to time and technological limits, there are still restrictions on 
this early exploration. Non-automatic data verification comes first. The blockchain ledger generates and records 
data fingerprints. The verification procedure is still manual, though. Further research is still required on 
automatically comparing a file with its fingerprint. 
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Kong SAR. 


ABSTRACT: Environmental, Social, and Governance (ESG) investing has become increasingly significant in the 
Architecture, Engineering, and Construction (AEC) industry. However, the AEC industry faces challenges such as 
non-uniform standards, complex information sources, and data security concerns when collecting and verifying 
ESG data. At the same time, as one of the key points of carbon emission in AEC projects, the ESG management of 
construction projects is still lacking. This paper proposed a blockchain-based ESG data management framework, 
which designed to address these challenges in the AEC industry. The framework and the smart contract and 
transaction data model applied in it realize data collection and information verification in construction projects. 
By leveraging blockchain technology's key features of transparency, immutability, and traceability, the framework 
ensures secure and efficient ESG data management. Additionally, the InterPlanetary File System (IPFS) 
technology enables access to original files for data verification and comparison, further enhancing authenticity. 
By integrating blockchain and IPFS technologies, our proposed solution enhances the reliability and traceability 
of ESG data in the construction projects, paving the way for more sustainable and transparent practices. 


KEYWORDS: AEC, Blockchain, Construction Project, ESG, IPFS, Smart contract 


1. INTRODUCTION 


Environmental, social, and governance (ESG) investing refers to a set of standards for a company's behavior, which 
socially conscious investors use to screen potential investments. In the Architecture, Engineering, and Construction 
(AEC) industry, ESG reporting serves as a method for evaluating a company's contributions to environmental 
protection, corporate governance, and financial capability. Investors and government management agencies often 
jointly assess a company's green development prospects based on ESG reports and other indicators. Concurrently, 
some investment institutions and market participants may establish ESG funds and investment expectations 
according to the company's ESG score. These behaviors directly influence the company's capital and stock market 
conditions. However, collecting and verifying ESG-related information is challenging due to various parallel 
standards, unclear evaluation criteria, and data security concerns. Ensuring that collected data can be safely verified 
by a third-party organization is also an essential issue since data tampering may render the company's ESG report 
and score unreliable if all data is controlled by a single department. 


Blockchain technology, characterized by high transparency, immutability, traceability, and non-repudiation, is a 
potential solution for recording transactions and tracking business operations. From cryptocurrencies to smart 
contracts, blockchain technology demonstrates its potential applications in the construction industry (Turk and 
Klinc 2017). In the current research, the intelligent construction platform or technology based on blockchain has 
shown a high degree of usability and advantages. During the construction phase of a building, a large amount of 
data is exchanged between various departments and personnel. Effective management of these data can improve 
work efficiency and reduce unnecessary data and economic losses. Features of Blockchain make it conducive to 
storing and tracing ESG-related data for AEC projects. The construction industry, in particular, plays an important 
part in global carbon emissions management. 


Regarding ESG-related research, the relationship between ESG and corporate performance highlights importance 
of ESG for business (Zhao, Guo et al. 2018). However, there is currently no available solution for AEC companies 
with complex information sources and data formats in construction projects. This paper aims to (1) propose a 
Blockchain-Based ESG Data Management framework, (2) design and apply technical components within the 
framework, and (3) verify the framework's feasibility through illustrative example. In order to provide a usable 
ESG data management method in the construction stage. 


2. LITERATURE REVIEW 
2.1 ESG in AEC Industry 


The ESG (Environmental, Social, and Governance) evaluation framework is a multi-level system. The ESG 
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framework is to measure the ability of enterprises to achieve sustainable development. Since the signing of a series 
of environmental protection and sustainability-focused documents and conventions, such as the Kyoto Protocol 
(Protocol 1997) and the Paris Agreement (Agreement 2015), sustainable development has become an increasingly 
important topic from national to corporate levels. The ESG framework focuses on a company's environmental, 
social, and governance performance rather than financial performance, helping global investors identify genuinely 
sustainable businesses. 


As climate and environmental issues become more severe, ESG assessment for companies has become a growing 
trend. The development of ESG-related management has become a focal point for research and disclosure. In 2021, 
the construction industry emitted approximately 10 billion tons of carbon dioxide, accounting for about 37% of 
global emissions (Programme 2022). Analysis of carbon emissions management or ESG in the construction 
industry is important. Related research on ESG and carbon emissions of engineering projects is also in progress, 
such as carbon emission estimation methods for highway project construction (Liu, Wang et al. 2017), and carbon 
emission analysis based on BIM in construction stage (Li, Fu et al. 2012). 


Popular ESG evaluation frameworks include GRI, ISO and SASB, each with its own set of analytical indicators 
and disclosure requirements. Certified assessment indicators are accepted by various stock exchanges and can 
impact the position of a listed company's stock. Research on the impact of ESG on the economic environment and 
corporate finance (Broadstock, Chan et al. 2021) has proved that ESG has a positive impact on corporate 
investment and market value. In the practices of ESG integration in AEC projects, some large listed companies 
have already attempted and explored this approach. For example, companies such as Gensler, China State 
Corporation, and JLL have carried out ESG analysis and planning. At the same time, industry research also 
conducted an analysis of the impact of ESG on the construction industry (Daszyhska-Zygadto, Fijałkowska et al. 
2022). 


However, there are still some challenges for implementing ESG in the AEC industry. The current ESG standards 
are not uniform, and the information during the construction phase is abundant and complex, making it difficult to 
apply directly. During the ESG-related management of enterprises, a large amount of business data needs to be 
submitted to the review unit. Management methods for sensitive data including materials, supplier data, etc. are 
also not securely secured (Oltsik, J. 2014). Moreover, available statistical standards still need to be constructed. 
As ESG reporting and rating directly affect the stocks and economic situation of companies, ensuring the reliability 
of related data has become a difficult issue that needs to be addressed. The ability to ensure the traceability of ESG 
data still needs further exploration. 


2.2 Blockchain in AEC Industry 


Blockchain technology is a distributed information system proposed by Satoshi Nakamoto. From its initial 
development of cryptocurrencies to the current adoption of smart contracts, blockchain technology has been 
rapidly researched and applied in various fields. Unlike centralized databases, blockchain technology has high 
transparency, cannot be modified, and is traceable. Its structure is shown in Figure 1. Therefore, while ensuring 
the characteristics of information security, the application of blockchain technology can provide benefits such as 
the application of blockchain and Industry 4.0 (Bodkhe, Tanwar et al. 2020), the attempt in the financial system 
(Treleaven, Brown et al. 2017), etc 


Block Head Block Head Block Head 


Transaction Data Transaction Data Transaction Data 
Model Model Model 


Fig. 1: Structure of Blockchain 


In the AEC industry, researchers are exploring the applications of blockchain technology. Examples include a 


328 


blockchain-based architectural design collaboration framework (Tao, Liu et al. 2022), a blockchain-enhanced BIM 
(Building Information Modeling) design process integrated with IPFS (Tao, Das et al. 2021), construction project 
supply chain systems that combine blockchain with IoT (Li, Lu et al. 2022), and a blockchain-based construction 
quality management platform (Zhong, Wu et al. 2020). These blockchain-based frameworks demonstrate the 
potential of blockchain technology to enhance the security and interactivity of construction project information, 
and they contribute to the industry's development of information management capabilities across various domains. 


Blockchain systems offer efficient and trustworthy features for smart construction in the AEC industry. In the area 
of ESG, the requirements for data verification and credibility are high. Due to its high transparency, traceability, 
and undeniable nature, blockchain technology is a good means of recording ESG data. The potential of blockchain 
technology in promoting ESG integration, monitoring, and reporting in the AEC industry is worth exploring. In 
this context, some attempts are already underway, such as research on ESG performance in sustainable supply 
chains (Liu, Wu et al. 2021), the use of blockchain token designs for ESG reputation to create a more 
comprehensive carbon trading market (Golding, Yu et al. 2022), and blockchain-based assessment systems using 
Life Cycle Assessment (Jiang, Gu et al. 2022). However, because the blockchain system cannot store large data. 
And as anew technology, the technical components and workflow applied in the blockchain system are still lacking. 
The current research on blockchain technology in ESG management is limited. 


3. METHOHOLOGY 
3.1 Blockchain-based ESG Data Management Framework 


Based on the analysis of the construction process and the sources of ESG data, this paper proposes a blockchain- 
based ESG data management framework for the construction process, as shown in Figure 2. The framework divides 
the construction process into three parts: project beginning, construction stage, and project delivery. The 
information required for ESG collection is placed in the data layer, with specific information sources and types 
(environmental data or social data) also labeled. 
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Fig. 2: Blockchain-Based ESG Data Management Framework 


The collected information will be uploaded to the blockchain system through the transaction data model. This 
information includes the identity of the uploader and ESG-related information, and the source files' hash values 
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obtained through the IPFS system are also recorded within the blockchain. When the construction project is 
completed or the ESG department needs to compile these data, they can access the information within the 
blockchain system. At the same time, ESG department reviewers can obtain source files through the IPFS system 
and compare the data in the files with the information in the blockchain to verify the information. Due to the 
transparency and immutability of the blockchain, all information uploaded to the blockchain system can be 
properly preserved and traceability is ensured. 


3.2 Transaction Data Model for ESG Management 


The assessment standards for ESG performance are not uniform, and some exchanges accept ESG assessment 
scores from multiple institutions for companies. Therefore, this paper has researched several widely recognized 
ESG assessment standards, compiling their assessment scope and target companies and user groups. Four 
representative standards include GRI, SASB, ISO, and CDP, as shown in Table 1. Among them, ISO's ESG 
performance assessment is dispersed across multiple standards, and some NGOs have adopted these standards. 


Table 1: ESG Standards and Objects. 


Standard Abbreviation Scope Industry Target Detailed Name 
Global Reporting Initiative GRI ESG Universal All parties involved GRI200 GRI300, GRI400 
Sustainability Accounting ESG, Business Industry 
SASB Investor SASB Standards 
Standards Board model assessment 
Organization for , cw ISO 9001, ISO 26000, ISO 
nek ISO ESG Universal All parties involved 

Standardization 14001, ISO 50001 

Carbon Disclosure Project CDP Environment Universal All parties involved CDP Standards 


Based on the definitions of environmental and social-related information within the collected standards mentioned 
above, this paper gathers indicators and data sources related to these types of information in construction projects 
within the AEC industry, as shown in Figure 3. The primary information sources include business documents, BIM 
(which includes design drawings), supply chain information, and construction site management information. After 
filtering and processing, this information will affect the ESG reporting and scoring of the construction project. 


Energy Usage . 
Construction . 

Business Documents 
Management 


Carbon Emissio: lsh J 


Resource Usage 


ANG 


Staff 
Management Building Information 
Modeling (BIM) 


General Waste 


Environmental 
@ i 


Hazardous 


Waste Supply Chain 


Management 
Wastewater 


Supply Chain 
Management 


Construction stage 


Renewable 


T) 
Energy Usage g, 
Energy 


Management 
Employment 0) 
Relationship = 
z Site Management 
cupational 
ealth/Safety g ) Pollution : 
Management 
Procurement 
Management B 


TO 
el 


Fig. 3: Sources of ESG Information in Construction Projects 


In the blockchain system, both data upload and query need to go through the transaction data model, with the 
information stored within blocks. A well-designed transaction data model is shown in Table 2. It includes the 
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information uploading department, information type, information source, data description, and IPFS hash code. 
For instance, when different construction teams upload ESG information, they are required to provide their team 
identity and specify whether the information is environmental data or social data. For energy usage, the type of 
energy and consumption can be described; for materials usage, the type and number of materials can be noted; and 
for personnel management information, descriptions can be provided according to the requirements of the 
governing department. Once stored in the blockchain system, this information can be provided to the ESG 
department for analysis and further processing. When needed, the source files can be found in the IPFS system for 
verification and comparison. 


Table 2: Transaction Data Model 


Attributes Values 
Upload Department Construction Team A 
Info Type E/S 
Info Resource Supply Chain Management 
Describe Concrete C40_50M3 
Date 20230401 
IPFS Code 12D3KooWEU7£Znpc8QBvVrscj... 


3.3 ESG-Construction Stage Smart Contract 


The blockchain system developed in this paper is based on Hyperledger Fabric. In accordance with the designed 
transaction data model, this paper presents a smart contract written in the Go programming language, as shown in 
Figure 4. The smart contract's functions include uploading current information, querying the latest information, 
and querying all information stored on the blockchain. 
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Fig. 4: ESG-Construction Stage Smart Contract 


In practical scenarios, the data upload function will be frequently used to ensure real-time updates of ESG data. 
The other functions of smart contract mainly serve the ESG assessment department. The blockchain-based ESG 
data management framework consists of two developed technical components: the transaction data model and the 
smart contract. These two parts support the usage of the framework. 
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4. ILLUSTRATIVE EXAMPLE 


This paper designs two scenarios to test the usability of the blockchain-based ESG data management framework. 
The first scenario involves data uploading and querying, as this process often needs to be repeated. For example, 
in actual construction projects, large amounts of materials and energy are used by multiple teams at the same time, 
and these records are essential in ESG analysis and score. The second scenario involves the ESG analysis 
department verifying the authenticity of past data. The team can verify the information based on the time 
information in transaction data model, the timestamps of the blockchain system, and the original files in the IPFS 
system. These two scenarios serve to validate the usability and traceability, which are characteristics needed by 
ESG analysis. 


4.1 ESG Data Upload and Query 


In this paper, the framework is built based on Hyperledger Fabric 2.2. For the first scenario, ESG data is fully 
stored within the blockchain. The data structure is consistent with the transaction data model settings, and the data 
query function is also implemented, as shown in Figure 5. 

(b) 
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Info Resource Supply Chain Management 
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Data in the blockchain system can be 
uploaded and queried 


De Data Query 


Fig. 5: (a) ESG Data Shows in Blockchain System; (b) Correspondence Between Blockchain System Data and 
Transaction Data Model; 


In this scenario, Construction Team A received 50 cubic meters of C40 concrete from the supply chain management 
system. Relevant participants were recorded, including the time of the reception event and the IPFS code for the 
source file (receipt document). The verification results for the upload and query functions of this framework were 
successful. 


4.2 ESG Data Verify 


For the second scenario, the verification in this paper utilizes the historical information query function in the smart 
contract. The results show all submitted ESG-related information. Additionally, by restoring the IPFS code stored 
in the blockchain and accessing the IPFS system, the experiment retrieves the material receipt document for 
Construction Site 001 that was stored in the IPFS system during the upload process. The information in the 
document is consistent with the information stored in the blockchain, as shown in Figure 6. 
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Fig. 6: (a) IPFS Hash Value in Blockchain System; (b) Verification of source files obtained through IPFS; 


This experiment demonstrates the successful implementation of the historical information query function in the 
Blockchain-Based ESG Data Management Framework and its integration with the IPFS system. By providing 
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access to the original files through IPFS, the framework meets the traceability requirements in ESG data 
management. 


5. CONCLUSIONS 


In conclusion, the blockchain-based ESG data management framework proposed in this paper effectively addresses 
the challenges of ESG data collection and verification in the construction project. By leveraging blockchain 
technology's advantages, such as transparency, immutability, and traceability, the framework ensures the credibility 
of ESG data while maintaining data security. The integration with IPFS further enhances the data traceability and 
availability. The verification result of the experiment was also successful. 


However, there are still limitations in the research: (1) The framework has only been verified in limited scenarios, 
and its stability and information throughput capacity have not been tested in actual engineering projects. (2) This 
design only considers the construction process, and other stages requiring ESG assessment have not been taken 
into account. In future research, the ESG management process in construction projects should be further considered, 
while further reducing the cost and efficiency of ESG assessment through secure information technology. As 
carbon neutrality goals are established and the demand for low carbon solutions becomes increasingly urgent 
within the AEC industry, further exploration of ESG analysis and technological applications is still needed. 
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A BLOCKCHAIN-BASED SECURE SUBMISSION MANAGEMENT 
FRAMEWORK FOR DESIGN AND CONSTRUCTION PHASES 


Das Moumita, Tao Xingyu, Xu Yuging & Jack Cheng 
The Hong Kong University of Science and Technology, Hong Kong S.A.R. (China) 


ABSTRACT: AEC projects generate numerous versions of BIM models during the design and construction 
phases. This process is complicated by the sheer number of domains in large projects and the interlinkage of BIM 
deliverables (for example the structural BIM model follows the corresponding architectural BIM modell). 
However, due to the generation of multiple versions and parallel design progress in different domains (especially 
in large projects), multi-domain delivery teams often fail to access and comply with the latest/required/approved 
design requirements during the progression of the design phase and complete issue addressing during the 
construction phase, which creates confusion, may lead to disputes. Moreover, due to the contractual nature of the 
parties involved data and process security is also very important. Therefore, this research presents blockchain- 
based secure coordination workflows for effective collaboration, parallel design progress, and issue management 
among BIM developers from multiple domains. Smart contract logic for facilitating dynamic dependency logic 
for coordinating linked multi-domain submission over the project timeline is presented. A method to ensure that 
issues are completely, and timely addressed, and related parties are held accountable for their actions or non- 
response is presented by integrating a BIM change identifier and blockchain in typical issue management 
workflows. The method considers collaborative design and issue management workflows for the secure, efficient, 
and complete design of BIM models. The method is validated using an ongoing large construction project in Hong 
Kong. 


KEYWORDS: Version Management, Issue Management, Security, Blockchain. 


1. INTRODUCTION 


Construction projects generate huge amounts of digital information that requires sharing among stakeholders from 
different construction domains during the design, construction, and operation phases. It is well established that 
effective building data management strategies (including rules, programs, and practices) integrated with a 
Common Data Environment (CDE) is recommended to streamline coordination among project partners to ensure 
project success. However, the existing CDEs are faced with several error-inducing methods/gaps that cannot 
prevent incomplete BIM model delivery during the design and construction phases or hold the responsible 
stakeholders accountable for their actions. In particular, the first problem in existing CDEs is the lack of secure 
(in terms of accountability) and automated coordination methods in multi-domain delivery teams — Construction 
projects especially during the design phase generate a large number of BIM model versions. This process is 
complicated by a large number of domains in large projects and their interlinkage in submission management. For 
example, it is a regular practice to design structural models (the dependent model in this case) based on the latest 
or approved architectural model (the leading model in this case). Similarly, MEP (Mechanical, Electrical, and 
Plumbing) models are designed based on both architectural and structural models. However, due to parallel design 
progress that creates multiple versions in each domain, multi-domain delivery teams often fail to access and 
comply with the latest/required/approved design requirements during the progression of the design phase which 
creates confusion, causing incomplete deliveries and resulting in disputes between the delivery teams. Therefore, 
methods to facilitate automated coordination of multi-domain submission management and ensure accountability 
of delivery teams are necessary. The second problem in existing CDEs is the lack of methods to automatically 
check the completeness of BIM deliverables and to ensure accountability of stakeholder actions during the 
construction phase. The construction phase is more complex in comparison to the design phase due to the 
involvement of real-time data, time-bound tasks, and the addition of new stakeholders such as sub-contractors 
from multiple domains. In big projects, a large number of issues may be generated for reviewing, resolution, and 
approval by multiple stakeholders. Hence integrating automatic delivery completeness checker and stakeholder 
accountability in the traditional issue management workflow is desirable. Some existing CDEs facilitate issue 
resolution workflows with options for manual coordination. However, this may not be efficient in large projects 
involving hundreds of issues and stakeholders. In addition, the existing CDEs lack robust and secure methods to 
ensure stakeholder accountability. 


A BIM-based issue management approach that incorporates the integration of issue management with other BIM 
processes such as design coordination was proposed (Wang & Wang, 2020). Jaly-Zada et al. (2015) extended the 
IFC schema to record a history of changes. A graph-based approach (Moayeri et al., 2017) was investigated to 
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capture the ripple effects of BIM version change. Jiao et al. (2013) developed a version update identifier at a BIM- 
object level to track changes and assign accountability for version change. Commercial and open-source 
applications such as Autodesk BIM 360 (Autodesk), Newforma BIM Track (Newforma, 2021), and Kubus IFC 
viewer (Kubus, 2022) have also deployed version and issue management for BIM deliverables. However, the 
existing approaches lack methods to accommodate submission dependency and completeness-related 
requirements in the traditional version management and issue management workflows. 


Therefore, a “Blockchain-based Secure Submission Management Framework” is presented that captures 
information from Open BIM models using the IFC format, documents, stakeholder actions, and automated 
methods and integrates them with blockchain to facilitate stakeholder accountability. The framework includes — 
(1) a Blockchain-based Dynamic Dependency Workflow to facilitate coordination and stakeholder accountability 
in large projects with dynamic interlinkage among submissions from different domains and (2) a Blockchain- 
based Completeness Checking Issue Management Workflow to facilitate automatic checking for faster/efficient 
issue resolution. To realize this framework, the existing IFC schema was extended to include version and issue- 
related information. An entity to store blockchain transactions was also created in the extended IFC schema to 
integrate IFC models with block-chained information. Blockchain smart contracts and ledger data models were 
developed to irreversibly record stakeholder actions during the version management and issue management 
phases. An efficient BIM change identifier method was developed using the BIM-segmentation and hashing 
method to efficiently parse and identify changes between two BIM models. This method was integrated into the 
traditional issue management workflow to automatically check whether all the issues related to a BIM deliverable 
were addressed or not. A prototype for the proposed framework was developed using the Hyperledger Blockchain 
Platform (Hyperledger, 2020) and was tested in an ongoing project in Hong Kong. 


The remainder of this paper is organized as follows: Section 2 presents the methodology of the proposed 
framework. The framework validation scenario and results are discussed in Section 3. The paper is concluded in 
Section 4. 


2. METHODOLOGY 


This section describes the methodology used to facilitate secure versioning and issue management in construction 
projects. As shown in Figure 1, the framework consists of three modules — (1) version management module and 
(2) issue management module that connects to a blockchain layer. These modules are discussed as follows: 
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Figure 1 Methodology using Open BIM (IFC) for secure, complete, and coordinated Submission Management 
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2.1 Blockchain-based Version Management Module 


This module manages and coordinates submission management considering versions and dependencies from 
different domains in an AEC project. AEC project submissions have dependencies on each other. For example, 
the submissions from the MEP (Mechanical, Electrical, and Plumbing) domain are dependent on the structural 
and architectural model of a building in general. This means the delivery team of the MEP model should ensure 
that they are using the latest/approved version of the architectural and structural models. Figure 2 shows some 
examples of dependency rules identified from a large ongoing project in Hong Kong. However, due to many 
versions generated during the design phase in each domain, the project stakeholders often fail to follow the 
latest/approved/assigned versions of the leading models. In this case, a dispute may happen between the leading 
and the dependent delivery teams causing a delay to the entire project schedule. Therefore, a method including 
IFC extension and smart contract logic was created to ensure that the dependent parties download the 
latest/approved/assigned leading model before they can submit their own models. Along with this checking, this 
action is also recorded in the blockchain to facilitate accountability of the leading and dependent parties at a later 
stage. Figure 3 shows the extended IFC schema which stores information such as Model name (includes domain 
name) and version number. An entity to store a blockchain transaction is also added in the extended IFC, called 
“TnxID” as shown in Figure 3, to link IFC models with blockchain ledgers. 
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Figure 4 Methodology for Dependency Checking by the Version Management Module 


As shown in Figure 4, dependency checking is performed by the version management module in four steps — (1) 
populating the IFC to be submitted with information such as version number and domain name, (2) submitting 
via a web interface, (3) invoking blockchain smart contract containing the dependency rule, (4) checking if the 
corresponding leading models were downloaded, (5) adding a record to blockchain ledger (Figure 5 (a)) and 
updating the uploaded model with a transaction ID. 
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| Timestamp | 01 March 2022 08:01:00 
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Figure 5 Blockchain Data Models for Version Management and Issue Management 
2.2 Blockchain-based Issue Management Module 


As shown in Figure 1, a workflow was developed to integrate issue management logic with OpenBIM and 
Blockchain to ensure issue resolution completeness and accountability. This workflow links project issues, 
stakeholders (Responsible, Accountable, Consulted, and Informed Parties), IFC-based change identifiers in BIM 
models, and blockchain. 
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Figure 6 Methodology of the BIM Change Identifier (Diff) Module using OpenBIM and Hash-based 


Segmenting. 


As shown in Figure 1, a BIM change identifier (called Ifcdiff module) is developed and integrated in the issue 
management workflow, OpenBIM has been used to integrate (interoperability properties) a BIM model change 
identifier (Diff) and blockchain with a traditional issue management workflow. Figure 4 shows the methodology 
of the diff which involves — (a) parsing the IFC models, (b) converting it into a tree-like data structure (using 
OpenShell)-called the main tree, (c) compressing the tree element-wise, space-wise, and zone-wise (as per user 
input) into a hash-based tree data structure, (d) comparing between BIM versions using the hash-based tree and 
identifying differences between hashes, (e) extracting the details of the changed elements for fine-grained 
comparison for geometry, location, and properties, (f) converting element shapes into a mesh data structure and 
perform mesh comparison to identify the change in geometry, (g) compare location and direction matrices to 
identify changes in the position of elements, (h) excel-based report generation recording the changes (addition, 
deletion, and modification based on IFC guides) for manual and automated review, and (i) recording of proof the 
diff file and issue status on the blockchain (as shown in Figure 5(b)). 


The workflow facilitates automated checking of issue resolution using the openBIM-based Diff logic. End users 
such as issue creators can mark the building elements (using IFC Global IDs) (using the IFC extension as shown 
in Figure 3), which are to be added/modified/deleted as per the issue resolution process. The workflow 
automatically updates the issue status as incomplete if all the issues are not resolved and send a notification to the 
issue creator for manual review and override if desired. As shown in Figure 1, the issue creation and resolution 
are recorded on the blockchain for accountability. 


3. VALIDATION 


For validation a prototype was A prototype was developed and tested. Python and Openshell (IfcOpenShell, 2022) 
are used to develop the diff algorithm, which is used to detect model changes that should be tracked and 
blockchained. Hyperledger Fabric serves as the blockchain platform to immutably and traceably record BIM 
coordination, delivery, and operation actions. This platform provides a secure and scalable way to manage BIM 
data, enabling efficient collaboration between different stakeholders. It also provides a transparent and secure way 
to manage BIM data, ensuring that project progress is monitored and tracked effectively. 


The dynamic dependency logic and the blockchain interface and tested for versioning in the Kai Tak Sports Park 
(KTSP) Project. The project has over 40 domains up to March 2023 which created at an average of 200-300 
versions per fortnight from the entire project. This required complex dynamic rule generation for linking 
submissions and managing their versioning. The prototype was tested and found to be effective in the management 
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of multi-domain submission management for the KTSP project. The end users found the platform to have the 
following value additions — (a) team members are sure about the model status (because of blockchained single 
source of truth), (b) team members are sure if the submissions have followed the correct dependency and version 
for model delivery for a milestone/review, (c) accountability of delivery teams to prepare deliverable with the 
correct information is added. 


7 á ten Anchor aine he ta o ¢ 2 


demo test Rude Diagram 


Figure 7 Prototype Web-interface Showing Dynamic Rule Creation and BIM Visualisation 


Figure 8 shows that the BIM change identifier was evaluated on models ranging from sizes 0.3 MB to 730 MB. 
It can be seen that there is a linear rise in the computation time of the diff program with increasing file size. As 
described in the methodology section, the diff logic uses a hash-tree-based data structure that segments the BIM 
model and facilitates comparison for a faster computation time. Figure 8 shows the results of hash-tree based 
segmentation on a model sized 730 GB. This method is particularly efficient in identifying small changes (such 
as a few modifications on a particular element type or floor) in large BIM models. The prototype is also being 
tested on the ongoing KTSP project. The end users have so far evaluated the platform as — the actions of all parties 
are blockchained which will be useful to hold parties accountable if the need arises. They have also stated that a 
confident audit trail record will be created, that will help resolution of future disputes or remeasurement of works 
upon quantifying and monetizing additional works variations. 
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Diff Time For Models of Different Sizes without Hash-Based Segmentation 
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Figure 8 Validation of the Diff Module of the Issue Management Workflow 


4. CONCLUSION 


A blockchain-based framework for version and issue management was presented in this paper. With the advent of 
digitization and the inherent contractual organizational structure in the construction industry, data security, 
integration, and stakeholder accountability has become very important. This is more important, especially in BIM 
projects which contain sensitive information and are difficult to manage in a multi-data owner environment. 
Therefore, the proposed framework includes - (1) Blockchain-based Dynamic Dependency Workflow and (2) 
Blockchain-based Completeness Checking Issue Management Workflow to facilitate integration and 
accountability in large-scale construction projects. For validation, a prototype was developed and implemented 
on an ongoing Hong Kong based project with real end users. The framework was found useful in terms of security 
and functionality by the end users including representing a single source of truth, maintaining version dependency, 
and completing outstanding issues. 
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ABSTRACT: Blockchain technology emphasizes trust and collaboration through distributed networks and is 
deemed to contribute to building information modeling (BIM) based construction collaboration and management. 
However, the open nature of blockchain introduces severe cybersecurity attacks that undermine the trustworthiness 
of construction management. One salient point is identity authentication for security BIM data access in the 
blockchain environment. The traditional public-private key or password authenticate methods are vulnerable to 
malicious theft. Zero-Knowledge Proof (ZKP) is an emerging, password-free method for authenticating identities. 
It allows one party to prove the truth or falsity of a statement to another party without revealing any meaningful 
information to the counterpart. Therefore, this study proposes a preliminary user authentication protocol based 
on the non-interactive ZKP protocol, specifically the zk-SNARK protocol, for adaptive authentication of blockchain 
BIM. The adaptive authentication recognizes a random subset of on-chain historical BIM operation records to 
prove the identity according to the protocol. Without revealing any meaningful knowledge to the authentication 
system, this adaptive data access control prevents password attacks using the BIM records on-chain. Finally, the 
proposed protocol is deployed on the test blockchain and implemented in a preliminary case study to illustrate the 
feasibility and effectiveness of the proposed method. The main contribution of this paper is twofold. Firstly, the 
theoretical contribution is proposing a novel zk-SARKs-based identity authentication protocol that utilizes the on- 
chain BIM operation records. Secondly, the practical contribution relies on presenting a ZoKrates-based workflow 
of generating proofs, creating smart contracts, and deploying on the blockchain for verification. 


KEYWORDS: Zero-knowledge proof, Blockchain, BIM, Construction Management, Identity Authentication. 


1. INTRODUCTION 


In multi-party construction activities, collaboration and trust emerge as significant yet intricate issues. Building 
Information Modeling (BIM) stands as a trending and burgeoning technology within the construction industry, 
facilitating efficient cross-disciplinary collaboration among stakeholders. However, given the inherent 
characteristics of construction activities—encompassing multiple data contributors, consumers, and 
geographically dispersed stakeholders—the centralized, file-based BIM collaboration necessitates stringent data 
access control. This measure is crucial to prevent deliberate cybersecurity attacks, such as login attacks. 


Ensuring that the right individuals access the appropriate BIM content — in essence, addressing authentication 
concerns — has emerged as one of the most critical issues in BIM-based collaboration. For example, Skandhakumar 
et al. (2018) proposed a BIM-based security model presented by BIM-XACML language to facilitate conditional 
access to BIM, and Zheng et al. (2019) offered a new context-aware access control model for the decentralized 
cloud BIM system. These studies cannot avoid password attacks, which represent intentional attacks to steal the 
authentication credentials such as passwords and private keys. 


Blockchain technology brings a transformative alteration from centralized BIM to distributed BIM that is deemed 
to achieve transparent, traceable, and consensus-based trusted collaboration with better encryption and security 
(Subangan & Senthooran 2019; Nawari & Ravindran 2019). Blockchain is a distributed ledger network first 
proposed in 2008 and implemented in 2009 (Zheng et al. 2018). In recent years, the integration of BIM and 
blockchain has been studied extensively, especially in the data security aspects. Inappropriate distribution and 
excessive BIM transparency may lead to a loss of reputation and trust. 


Considerable literature on trust has grown up around the theme of blockchain BIM in different project aspects (Wu, 
et al., 2022; Zhao, Chen, & Xue, 2023). For example, Das et al. (2021) categorized BIM data security into five 
types and emphasized the confidentiality and authenticity of distributed BIM. The confidentiality of BIM 
represents the necessity of safe data access by authorized people or organizations, and authenticity involves user- 
related and data-related regions. The Hyperledger community presented a “privacy data” mechanism that only 
stores the sensitive data as hash code in ledgers and source data such as BIM files off-chain (Androulaki et al. 
2018). Accordingly, Tao et al. proposed an access control model based on asymmetric encryption and sensitive 
BIM model components decomposition (Tao et al. 2022). Much previous research utilized traditional encryption 
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or access control policy to guarantee secure data access in the distributed blockchain environment, which 
encountered the same challenge as the centralized BIM, password attacks. 


Adaptive authentication stands as a potential solution for circumventing password attacks. Zero-knowledge proof 
(ZKP) constitutes a cryptographic technique that empowers the prover to persuade the verifier without divulging 
further meaningful information regarding statements. ZKP has three significant properties: (1) completeness. If 
the statement or witness is true, the verifier can be convinced by the honest prover. (2) reliability. The prover can 
deceive the verifier with a negligible probability if they do not know the statement. (3) zero knowledge. The 
verifier only obtains the information “the prover has this knowledge” without extra meaningful information 
(Goldreich & Oren 1994). These three main properties of ZKP are deemed to contribute to multi-party identity 
authentication for blockchain BIM. Typically, ZKP can be classified into two types, interactive ZKP and non- 
interactive ZKP (Hu et al. 2018). The non-interactive ZKP is widely used in the blockchain environment due to its 
one-way communication between the prover and the verifier, specifically the zero-knowledge Succinct Non- 
interactive ARguments of Knowledge (zk-SNARKs) protocol (Parno et al. 2016; Groth 2016). Anonymous 
currencies such as Zerocash (Sasson et al. 2014) and SERO use zk-SNARK in the blockchain domain for 
implementing privacy transactions. A large volume of published studies describes the role of ZKP in solving 
identity authentication problems of blockchain. Wang et al. (2020) summarized the existing solutions to privacy 
protection issues in blockchain and emphasized the effectiveness of ZKP. Yang et al. (2020) formulated an identity 
management scheme by leveraging smart contract and ZKP algorithms; Sun et al. (2021) proposed a two-part 
framework of ZKP in blockchain as on-chain and off-chain parts respectively, aiming to provide a solution of 
security private data access in the blockchain environment. However, due to the high cost of computation and 
storage of ZKP, the lack of using historical data stored in the blockchain, and the primitive application of 
blockchain BIM, research on ZKP-based identity authentication for BIM collaboration in the blockchain 
environment is still preliminary. 


For a construction project, both BIM files and BIM-related operation records are stored in the blockchain system. 
Due to the limited storage capability of blockchain and the massive-volume nature of BIM files, a typical 
blockchain BIM system is composed of two parts: on-chain and off-chain parts. The on-chain part stores the 
metadata of BIM files, including the file name, size, owners, creating time, version information, operation records, 
and other file-related descriptive data. The off-chain part keeps large-size BIM data such as BIM model files and 
documents of the project. A representative two-part structure blockchain BIM system is proposed by Tao et al. 
(2021), and Xue and Lu (2020) introduced a semantic differential approach to capture changes in the local BIM 
model as transactions on chain. The storage methods of BIM data in the blockchain are out of the scope of this 
paper, but the BIM data itself is possible to facilitate the ZKP-backed identity authentication of the construction 
blockchain. 


This study proposes a Zk-SNARKs-based identity authentication protocol for blockchain-backed BIM 
collaboration. By utilizing the random subset of historical data records in the blockchain, stakeholders involved in 
the construction projects with distinct roles and responsibilities access the blockchain channel through an adaptive 
authentication process. By leveraging ZKP, a dynamic login function is achieved that avoids password attacks and 
provides a trusted identity authentication function. The proposed Zk-SNARKs-based BIM user authentication 
protocol is described in part 2, and a case pilot is illustrated to prove the possible feasibility of the protocol in part 
3. Then, the pros and cons of the protocol are discussed in part 4, and part 5 shows the limitations and future works. 


2. A ZKP-BASED AUTHENTICATION PROTOCOL FOR BLOCKCHAIN BIM 


The two-step authentication processes between BIM stakeholders and the blockchain network is shown in Fig. 1. 
Firstly, the BIM stakeholders act as provers to prove their authority to the blockchain by providing statements, 
which are the knowledge of BIM in their mind. Then, the blockchain verifies the correctness of the statement 
automatically by the deployed smart contract and responds to the BIM stakeholder. 


2.1 The zk-SNARKs-based authentication protocol 


The structure of the proposed zero-knowledge Succinct Non-interactive ARguments of Knowledge (zk-SNARKs) 
based authentication protocol is depicted in Fig. 2. The first layer is the blockchain layer which is composed of a 
block network to store the BIM editing records. On top of the blockchain layer, various smart contracts deployed 
on the blockchain constitute the second layer. Three types of smart contracts for data storage, data querying, and 
proof verification are involved. With these smart contracts, BIM stakeholders can interact with the blockchain 
network, such as uploading the BIM editing records by the data storage smart contract and verifying their identities 
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by the verifying smart contract. The uppermost layer is the application layer, which provides functions for users 
to interact with the blockchain. Two services, namely identity provider (IdP) and BIM server provider (BIMSP), 
are designed in this layer. The IdP are various identities, including project manager, civil engineer, designer, BIM 
engineer, and their combinations. IdP is designed to provide multiple options for users’ identification. The BIMSP 
provides adding, modifying, deleting, and querying operating authorities of BIM models. Authorized users can get 
editing rights corresponding to their identity in the BIM model. 


Knowledge of BIM 
(e.g., Editing records 
of BIM files) 


Provide statements and the response to the challenge 


Stakeholders 


(Contractors) Blockchain System 

(Owners) (Smart Contract) 
(Designers) 

(Authorized person) Verify the knowledge of statements and responses 


Fig. 1: Possible facilitation of BIM in the structure of ZKP for construction blockchain 


Contractor (Client 


Smart 
Contract 
Layer \ 
Data storage 


Genesis 
block 


Fig. 2: Structure of the proposed zk-SNARKs-based identity authentication protocol 


This paper focuses on the smart contract for proof verification, which implements an automatic authentication 
model based on the zk-SNARKs protocol. The zk-SNARKs protocol supports succinct proof verification by the 
one-way message communication between the prover and verifier. Its development processes involve five main 


345 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


steps: 


1) 


2) 


3) 


4) 


5) 


Define the domain-specific data model to describe the BIM-related knowledge for identity authentication. 
For instance, to describe the editing history records of a door, the data of “editing month”, “door _level”, 
and “editing action” should be involved. The “editing action” is further categorized into four subtypes: 
add, delete, modify, and query. 


Describe the logic of the domain-specific data model to be proved using the NP statement such as Rank- 
1 Constraint Satisfaction (RICS). An NP statement means that if you have a solution, it is 
computationally easy to prove it, but it is not computationally to find the solution. In this way, the ZKP 
protocol is completeness and soundness. 


The proof circuit accepts some common parameters as input and generates a Common Reference String 
(CRS). 


Generate proof about the proposition. 


Verify the proof. 


The core step is generating the NP statement of the domain-specific data model, namely the logic proof circuit. 
Many tools are developed to do this work, such as Zokrates, libsnark, snarkjs, etc. In this paper, the Zokrates 
toolbox, which is developed by Ethereum, is implemented to convert the BIM-related data model to the logic- 
proof circuit. A detailed example is presented in Part 3. 


2.2 Workflow 


The workflow of proving the identity to the blockchain system is depicted in Fig. 3. A user defined by IdP first 
requests a login to the blockchain system with a role confirmation process. Then, BIM stakeholders generate proof 
of the BIM-related knowledge, such as the editing of a BIM version at a specific time, to the blockchain. The smart 
contract for verifying the proofs is pre-deployed on the blockchain and verified automatically. If the proof is 
verified as true, a one-time password authenticates the user to log in. The login event will be recorded on the chain, 


too. 


IR és 292 
Blockchain ry 
BIMSP (Smart Contract) lap 


Apply for Log 18 


Sav i 
ave BIM editing records 


| Request Prove 


Get the right to operate BIM 


Fig. 3. The workflow of the proposed protocol 
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2.3 Software implementation 


An identity authentication smart contract (IASC) is deployed on the test blockchain network by ZoKrates (Version 
0.8, available at https://github.com/Zokrates/ZoKrates). ZoKrates (2023) is a toolbox for implementing zk- 
SNARKs on Ethereum, which provides functions such as trusted setup, libraries, and proving schemes et al. The 
Remix IDE is an open-source web-based IDE that creates, compile, tests and deploy Ethereum-based smart 
contracts on the blockchain network (Jain , 2022). Fig. 4 illustrates the interface of generating an IASC and 
deploying it on the test blockchain network by the Remix IDE. 


FILE EXPLORER 


® default -workspace 


QLea 


(a) Definition of the logic circuit (b) Generated proof and IASC 


(C) An example of IASC 


Fig. 4: Interfaces of Remix IDE. (a) define a logic circuit; (b) Generation of proof and identity authentication 
smart contract; (c) an example of IASC 


As Fig. 4(a) shows, the circuit is declared in the “.zok” file alliance with the ZoKrates schema, the private filed 
type represents the secrete input that will not be revealed in the proof process and return a bool value that represents 
the verification result. A BIM user is able to generate a proof file, namely the “proof.json” file in Fig. 4(b), which 
will be sent to the blockchain for verification. The generated IASC is shown in Fig. 4(c), including five parts: (1) 
two data structures that describe the verification key and proof data; (2) three verification functions. 
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3. CASE PILOT 
3.1 Case selection 


A project at the University of Hong Kong (HKU) was selected in this section to demonstrate the proposed zk- 
SNARKs identity authentication protocol. The project was a modular construction project of a student residential 
apartment located at Wong Chuk Hong (WCH), involving 1,224 modules. Each module involves three main phases. 
Each module involves three main phases: 


1) Manufacturing phase that the module is manufactured in the factory in Mainland China; 


2) Logistics phase that transports the module from Mainland to Hong Kong through maritime transportation 
and land transport; and 


3) On-site installation of modules. 
3.2 Mapping of project workflow to zk-SNARKs 


The fragmented construction phases and multi-disciplinary participants require high-level access control to the 
blockchain BIM collaboration platform. The IdP of this project defines three main stakeholders: 


1) The main contractor. Paul Y. Engineering is the main contractor, and is responsible for module 
manufacturing and inspections, module transportation, and on-site installation. Paul Y. utilizes BIM for 
the on-site instructions and proposes potential changes to the consultant and client. 


2) The lead consultant. Architecture Design and Research Group Ltd. (AD+RG) is the lead consultant for 
design and the key author and data contributor of BIM. Architecture, structure, water/drainage, and 
HVAC designers collaborate on the comprehensive BIM. 


3) The client. HKU is the client of this project can access the whole BIM collaboration system. 


According to the category of roles and responsibilities, the statements of the Zk-SNARKs-based authentication 
protocol of each group should be as follows: 


1) Paul Y. Engineering: Browsing history of BIM. For example, “is the statement ‘User A browse the 
architecture BIM Ver. 1.2 on 29" September’ True or False? 


2) AD+RG Ltd: Browsing and authoring history of BIM, including components’ changes and revisions. For 
example, “Is the statement ‘the DELETE operation of sliding windows/doors at level 4 was done before 
June’ True or False?” 


3) HKU: Browsing records of BIM models or related project information in the blockchain transactions. 
For example, “Is the statement ‘so far, there have been 4 approved BIM changes in the structure domain’ 
True or False?” 


To map the project workflow to the proposed zk-SNARKs-based identity authentication protocol, a statement of 
BIM-related knowledge should first be converted to a computational problem. For instance, the statement “User 
A browse the architecture BIM Ver. 1.2 on 29™ September” should be converted as “if version == 1.2 and date == 
29" Sep then result == True else result == False”. Then, an NP statement such as RICS should be developed. 
Specifically, R1CS is a sequence of three vector sets (A, B, C) that satisfy the equation sA s: B = s: C, where 
s is the solution vector and A, B, C are coefficient vectors. The simple “:” represents the inner product operation 
of vectors. The computational statement is converted into several simple expressions to represent the computation 
logic, such as x ® y = z where “©” represents the “+” or “x” operations in the proof circuit. For the example 
computation statement, the operation is “+”. In this way, the project workflow statement is converted to RICS 
circuits for further computation. 
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3.3 Preliminary results 


def mpin(private field level, private field 
currentMonth) -> soolf 

returna level se 4 88 currentMonth < 6; 
} 


Step 1: Coding of the statement è pe " Saias 
Step 2: Compile the circuit to constraints Step 3: Setup to generate the public -private key pairs 


Step 4: Generate proof Step 5: Deploy the smart contract and verify the proof 


Fig. 5: An example of the identity authentication process of the project consultant. 


Fig. 5 gives an example of the zk-SNARKs-backed identity authentication process of the project consultant. The 
Remix IDE (Ethereum 2023) is utilized to deploy the smart contract. The statement “the DELETE operation of 
sliding windows/doors at level 4 was done before June” is coded as a computation function firstly according to 
Sec. 3.2. Then, users can compile the function as a logic circuit with 561 constraints by the compile function 
provided by Remix IDE. The zk-SNARKs require setup to generate a public-private key pair for further proof 
generating, which is done in Step 3. After the setup process, BIM stakeholders generate proof through their private 
input, namely their secret knowledge. As Step 4 shows, the generated proof file is a JSON file that involves data 
of prove schema (elliptic curves) and a hash of proofs. After that, a smart contract with the suffix “.sol” is extracted 
and deployed on the test blockchain network in Step 5. Finally, the proof file is sent to the smart contract and 
verified. As Errore. L'origine riferimento non é stata trovata. shows, the verification of the consultant is true 
in Step 5. 


4. DISCUSSION 


The potential influences of the proposed Zk-SNARKs-based BIM user authentication protocol can be analyzed 
and discussed from different perspectives, including technology, business, and user. 


From a technological standpoint, Zero-Knowledge Proof (ZKP) aligns effectively with on-chain knowledge. 
Through the synergy of Building Information Modeling (BIM) and blockchain, construction knowledge attains 
inherent transparency and becomes readily accessible via the distributed ledger. The blockchain records historical 
BIM operation activities, preserving collaboration traceability. Meanwhile, off-chain BIM files can be securely 
accessed via on-chain indexes. The amalgamation of on-chain and off-chain BIM-related data presents itself as 
suitable knowledge for user identity authentication. Consequently, the integration of ZKP and blockchain BIM 
emerges as a rational and achievable endeavor. 


From a business standpoint, the identity authentication protocol based on Zk-SNARKs fosters a higher degree of 
reliability and trust in BIM-centered collaborations within the blockchain ecosystem. The adaptive access control, 
rooted in BIM knowledge, safeguards user identity privacy while accommodating an array of application scenarios, 
including bidding qualifications. Moreover, this identity authentication solution empowers traditional 
organizations and data providers to securely generate sensitive data. 


From a user perspective, the utilization of ZKP empowers BIM stakeholders to efficiently create a suite of identity 
authentication smart contracts based on their project-specific knowledge. Nonetheless, the adoption of the zk- 
SNARKs-based identity authentication protocol demands a fundamental grasp of ‘computation representation of 
knowledge." In essence, users must formulate the logical representation of project-related knowledge. Regrettably, 
this prerequisite presents a barrier to the widespread adoption of ZKP. 
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5. CONCLUSION 


The advent of blockchain introduces new prospects for transparent, immutable, and secure distributed BIM 
collaboration. Yet, owing to the inherently open nature of blockchain, the integrity of construction management 
faces significant threats from severe cybersecurity attacks, particularly password attacks. Establishing secure 
identity authentication mechanisms for data access within the realm of blockchain BIM becomes the cornerstone 
of trustworthy collaboration. However, conventional methods of identity authentication, such as the traditional 
public-private key pair or password, are susceptible to malicious exploitation. Enter zero-knowledge proofs (ZKP), 
a cryptographic technique empowering the prover to persuade the verifier without disclosing additional meaningful 
information about their claims. Consequently, ZKP is positioned to endorse password-free identity authentication, 
basing approvals on users' knowledge, thus effectively sidestepping the risk of malicious password theft. 


This paper introduces a ZKP protocol, specifically zk-SNARKs, designed for identity authentication within the 
context of the blockchain BIM environment. Through the utilization of the zk-SNARKs protocol and on-chain 
historical BIM editing data, this study establishes an adaptive identity authentication process for collaborative 
efforts based on blockchain BIM. In contrast to conventional password or public-private key authentication 
methods, this study employs the knowledge of BIM and construction projects as the primary means of identity 
verification. ZKP ensures the privacy and security of this knowledge, effectively functioning as the safeguard to 
authenticate identities. 


A pilot case implements the proposed protocol by deploying a smart contract on the test blockchain network. The 
results vividly illustrate the feasibility of the proposed method. Subsequently, we delve into the potential impact 
of ZKP from technological, business, and user perspectives. The theoretical contribution of this research hinges 
on the development of a zk-SNARKs-based identity authentication protocol. This protocol efficiently leverages a 
subset of on-chain BIM editing records. In practical terms, the workflow of the proposed identity authentication 
protocol guides BIM users through tasks such as creating domain-specific circuit descriptions of BIM knowledge, 
developing the smart contract, and deploying it on-chain using tools like ZoKrates and Remix IDE. 


This research is subject to several limitations. Firstly, the zk-SNARKs protocol necessitates a trusted setup before 
generating proofs and theoretically could produce false proofs that appear valid to the verifier. Furthermore, the 
intricate domain-specific knowledge associated with blockchain-BIM-based construction management remains 
unexplored, warranting further investigation to formulate a specialized domain-specific language. Lastly, the case 
pilot's scope is confined to laboratory testing, necessitating more extensive trials in complex real-world project 
scenarios. As a recommendation for future studies, we propose the exploration of more advanced ZKP protocols, 
such as zk-STARK, to effectively address the aforementioned limitations. 
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ABSTRACT: Over time, several procurement methods have been adopted to facilitate the successful delivery of 
construction projects with minimal financial losses in order to offer maximum value to clients. In recent years, the 
Integrated Project Delivery (IPD) procurement model has been introduced for better overall financial performance. 
In this model, every member of the project team has a stake in overall profit or risk irrespective of the extent of 
their roles and change orders and correction of errors and omissions are managed effectively with minimal 
contractual disruptions. This paper aims to address some of the previously cited barriers in earlier scholarly work, 
and it proposes a conceptual framework that integrates two novel concepts towards tackling technological and 
financial barriers in adopting IPD namely, BIM and Smart Contracts (SC). A framework is developed for a BIM- 
blockchain-IPD whereby the BIM model is integrated with blockchain technology, thereby acting as an immutable 
and transparent information repository and a platform for interdisciplinary collaboration in Architecture, 
Engineering and Construction (AEC) projects. The smart contract feature of blockchain technology offers an 
automated equitable distribution of risk and reward amongst project stakeholders based on agreements at project 
inception. Thus, the research contributes to a more efficient project delivery method by avoiding information 
asymmetry amongst stakeholders through a tamper-proof, BIM-enabled Common Data Environment (CDE). The 
proposed framework is validated with qualitative analysis of information obtained based on AEC industry 
procurement workflows. 


KEYWORDS: Integrated Project Delivery, Smart Contract, Blockchain, BIM, AEC, Common Data Environment. 


1. INTRODUCTION 


The American Institute of Architecture defined Integrated Project Delivery (IPD) as “a project delivery approach 
that integrates people, systems, business structures and practices into a process that collaboratively harnesses the 
talents and insights of all participants to optimize project results, increase value to the owner, reduce waste, and 
maximize efficiency through all phases of design, fabrication, and construction” (AIA California Council, 2007). 


Integrated Project Delivery can be broadly categorized as a type of relational project delivery arrangement (RPDA), 
developed to generate a cooperative and trustful climate for project implementation that requires an honest and 
open communication for establishment of a trustful relationship (Lahdenpera, 2012). One key element in this form 
of procurement process is an early integration of the project team at inception. The early integration of different 
project participants has a main influence on the optimization of the design and therefore also on the construction 
as processes become more consistent with less rework (Heidemann & Gehbauer, 2010). 


This paper conceptualizes a scenario where blockchain technology, through smart contracts can be integrated with 
BIM to facilitate integrated project delivery, towards an improved procurement process. It begins with a terse 
review of relevant literature under a few thematic headings. Then, diagrammatic workflows are used to illustrate 
a theoretical framework and the interconnected networking of project stakeholders. After which, a use case 
scenario of a procurement workflow using smart-contract enabled BIM for IPD is used to describe the framework. 


2. LITERATURE REVIEW 


Project delivery methods continuously evolve to address the specific needs and concerns at the times and each 
method has implications on the cost, schedule and quality performance, albeit how much performance is typically 
affected is still unclear. (Sullivan, Asmar, Chalhoub, & Obeid, 2017). Various project delivery types have been 
used in the AEC industry globally such as Design Bid Build (DBB), Design-Build, Design Build Operate and 4) 
Construction Manager at Risk (Roy, Malsane, & Samanta, 2018). Lahdenpera (2012) highlighted six (6) key 
features of RPDA or a cooperative delivery approach namely; a cooperative culture, team formation, administrative 
consistency, commercial unity, planning emphasis and operational procedures (Lahdenpera, 2012). In 
collaborative projects, stakeholders must have a high level if shared understanding with respect to cooperation, 
control and coordination to achieve mutually desired outcomes. (Ali & Haapasalo, 2023). However, the complexity 
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of construction and the multiplicity of stakeholders and their interests raises the probability for disputes and 
conflicts. Alaloul et al (2019) described construction as a fertile seedbed for disputes. (Alaloul, Hasaniyah, & 
Tayeh, 2019). Kumar et al (2020) highlighted 14 factors which lead to dispute in construction in order of hierarchy, 
stating that ambiguous language of contract was the most influential factor, which may also lead to opportunistic 
behavior, delayed response to decisions and unrealistic expectations, which may in turn lead to poor 
communication between project partners, culminating together with other factors to cause payment delays and 
eventually project cost overrun (Kumar Viswanathan, Panwar, Kar, Lavingiya, & Jha, 2020). 


2.1 Challenges and Limitations of Integrated Project Delivery 


Construction supply chains have remained contested, fragmented and highly adversarial because of the conflicting 
nature of demand and supply (Cox & Ireland, 2002). Kahvandi et al (2019) highlighted for limitation categories 
for the use of IPD on projects namely; contractual, environmental, managerial, and technical ones and resolving 
contractual challenges is very effective in resolving environmental, managerial, and technical challenges. 
(Kahvandi, Saghatforoush, ZareRavasan, & Preece, 2019). In less developed construction sectors like Nigeria, 
practitioners are aware of IPD but not as proactive towards its application, of which technological, legal, financial 
and cultural issues are hindering its widespread adoption (Ebekozien, Aigbavboa, Aigbedion, Ogbaini, & Aginah, 
2023). Similarly, lack of interest amongst stakeholders involved in the construction supply chain and negative 
perceptions about the efforts, risk and expenses required in implementing IPD are observed limitations to its use 
(Durdyev, Hosseini, Martek, Ismail, & Arashpour, 2020). 


2.2 Smart Contract Solutions for Construction 


Smart contracts (SC) are contract clauses written in computer programs that will automatically self-execute when 
predefined conditions are met. They consist of transactions essentially stored, replicated and updated in distributed 
blockchains (Zheng, et al., 2020). The construction industry worldwide is known for its adversarial working 
relationships which exist between the stakeholders (Phua & Rowlinson, 2003). Young-Ybarra & Weirsema (1999) 
found trust to be the only component of social exchange theory that had a positive effect on flexibility of strategies 
(Young-Ybarra & Wiersema, 1999) Pishdad-Bozorgi (2017) explored trust dynamics on real world IPD projects 
and both case studies used in the research confirmed that IPD was more effective in building trust than traditional 
delivery method (Pishdad-Bozorgi, 2017). 


2.3 Blockchain and BIM in Construction 


The AECO industry began to actively deploy BIM on projects in the early and mid-2000s (Jung & Lee, 2016). In 
a scientometric review, Liu et al (2019) mentioned that research in the field of BIM has been developing 
continuously and has completely subverted the traditional operation mode of AEC industry, while attracting more 
researchers’ attention at the same time (Liu, Lu, & Peh, 2019). Lawal & Nawari (2022) proposed a BIM-blockchain 
unified ledger to provide traceability for building components for a more auditable real estate valuation. (Lawal & 
Nawari, 2022). One of the most commonly researched blockchain applications in AEC is its integration with BIM 
for improved workflows and processes amongst construction stakeholders, thereby fostering improved 
collaboration. (Nawari & Ravindran, 2019) (Zhang, Doan, & Kang, 2023). BIM adds one or more additional 
dimensions to traditional design approaches which is the information layer that describes physical properties of 
building components. Innovations to the blockchain-BIM integration has made it possible for a shared platform 
like BIM to provide security of sensitive data either through a confidentiality minded framework (Tao, et al., 2022) 
or by using lightweight blockchain-as-a-service prototypes in the case of emergency construction projects (Tao, et 
al., 2023). Applications of blockchain-BIM are prevalent in pre-construction stage for secure and traceable control 
of design documentation, however, as the maturity level of both technologies increase, this integration will cut 
across project lifecycles. 


3. THEORETICAL FRAMEWORK 


The use of Building Information Models (BIM) for generation of information has become widespread in the 
Architectural, Engineering and Construction (AEC) industry in the past decade. BIM also refers to the virtual 
process and workflow that encapsulates all aspects, disciplines and systems of a facility or asset within a unified 
virtual model to facilitate a more accurate and efficient real-time collaboration (Azhar, Khalfan, & Maqsood, 2012). 
BIM is a revolutionary technological development that is rapidly reshaping the AEC industry and transforming the 
way we build, and the AEC industry have pushed stakeholders to use BIM extensively in a streamlined and 
integrated manner over the building lifecycle (Liu, Lu, & Peh, 2019). BIM helps to discover collisions which 
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usually occur during construction in a high number and therefore, the team is able to resolve those already during 
the design phase. A huge amount of time, rework and redesign can be eliminated (Heidemann & Gehbauer, 2010). 


The image in figure 1 shows an interconnected loop between all project stakeholders and between stakeholders 
and the BIM model, which is housed in a cloud-based Common Data Environment. Earlier studies have proposed 
a Common Data Environment (CDE) for secure data storage of digital assets, interdisciplinary coordination, 
management, and versioning of information containers (Sreckovic, et al., 2021) (Wang, Wu, Wang, & Shou, 2017) 
(Pishdad-Bozorgi, Yoon, & Dass, 2020). Figure 1 below is a diagram of interrelationship between all project 
stakeholders. A cloud-hosted blockchain CDE which is the agreed information repository that records all additions 
and alterations to the contained information, is used to house a shared BIM model. Blockchain provides a 
decentralized, automated and secured financial platform which enables multiple parties to control and track 
financial transactions (Elghaish, Abrishami, & Hosseini, Integrated project delivery with blockchain: An 
automated financial system, 2020). 
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Figure 1: Network of project team and BIM model, with smart contracts managing the interrelationships 


The CDE connects to a front-end interface wherein all participants are visibly interconnected and smarts contracts 
embedded between every interconnection act as triggers to automate and record the transition to a new phase of 
engagement once certain conditions / project milestones are reached, as confirmed by the BIM and a physical 
model twin contained in the CDE. The physical twin is derived through the use of IoT and BIM. The emergence 
of IoT has transformed the way data is shared across various sources (Barricelli, Casiraghi, & Fogli, 2019). Digital 
Twins refer to the process of merging the virtual world and real world, and has become a widely accepted tool in 
the Architecture, Engineering, and Construction (AEC) industry due to its ability to enhance cross-disciplinary 
collaboration (Sahal, Alsamhi, Brown, O'Shea, & Alouffi, 2022). The linear flow of information and smart contract 
deployment referred to in Figure 1 is shown below in Figure 2. 
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Figure 2: Linear Information Flow in Integrated Project Delivery 


Smart contracts are deployed at instances where a client 1) engages a new service provider, 2) reviews and signs 
off on information, 3) initiates a bidding process, 4) appoints a contractor and also when consultants 1) review and 
sign-off on documents and procedures submitted by contractors and 2) signs off on construction at project 
completion. 


4. USE CASES 


The use case scenario will be discussed under to main headings; 1) BIM-enabled Integrated Project Delivery and 
2) Smart Contract Payment Method through BIM Monitoring. 


4.1 BIM-Enabled Integrated Project Delivery 


The ability of BIM to replicate physical scenarios throughout the building lifecycle makes it suitable for 
collaborative workflows. BIM and IPD are process innovations that are driven by technology and reconfigure 
social relationships (Rowlinson, 2017). Existing literature suggests that BIM and/or IPD can dramatically enhance 
project performance from conceptualization through building management, and ongoing operations. (Ilozor & 
Kelly, 2012). This scenario leverages the abundance of research in BIM and IPD. The Project Manager (PM) or 
Lead Consultant (LC) creates the initial BIM model and shares it with the client, other consultants and all the 
contractors as they join the project. The PM/LC acts as the network administrator throughout the project. All 
changes as well as milestones are securely recorded in the back-end interface using the smart contracts and these 
milestones are visible to all members of the project team, so every party knows what stage every aspect of work 
is. Contract administration is enumerated under section 4.2 using the principle of Common Pool Resource (CPR). 
BIM-based solutions for IPD have also been proposed to enable accurate cost estimation at project inception when 
little information is available on the front end (Elghaish, Abrishami, Hosseini, & Abu-Samra, 2021) 


4.2 Smart Contract Payment Method Through BIM Monitoring 


Common-pool resources are systems that generate finite quantities of resource units so that one person's use 
subtracts from the quantity of resource units available to others (Ostrom, Gardner, & Walker, 1994). Hunhevics et 
al (2020) suggested that the governance of a Common Pool Resource (CPR) scenario was a useful guide to future 
research and applications of blockchain in construction. (Hunhevicz, Brasey, Bonanomi, & Hall, 2020). This phase 
of the use case deploys the combination of the BIM model, the model’s physical twin on site using IoT technology 
to implement Digital Twins as discussed earlier, and the CPR as illustrated in Figure 3 below. The diagram below 
is a blow-up of the CDE shown in Figure 1. 
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Figure 3: Components of the Common Data Environment 


Earlier sections have shown how workflows of several project stakeholders can be integrated. However, contract 
administration, clarity of contract language and timely payment disbursement are some of the factors which make 
traditional procurement methods cumbersome. In this instance, the human component of contract administration 
and payment disbursement are eliminated by depositing project funds in an escrow account, otherwise referred to 
as the CPR. The CPR acts as a third-party agent except that it is not triggered by any one individual. Once a 
contract milestone is reached by any of the project stakeholders, the digital twin pair of the BIM model and the 
IoT powered construction communicate with the CPR to trigger a smart contract between the client and the 
corresponding project stakeholder. Payment is automated to such stakeholder based solely on the attainment of an 
earlier agreed milestone. 


5. CONCLUSION 


This research has built upon a preponderance of academic endeavors in the field of BIM, Integrated Project 
Delivery, Blockchain, and Smart Contracts within the AEC industry. With recent academic interests in the use of 
BIM at the forefront of some of the most cutting-edge innovations in construction and deployment of smart 
contracts to facilitate payment in construction projects. Beginning with a terse literature review which discussed 
general scholarly efforts around Smart Contracts and IPD, the literature review was broken down into thematic 
areas like challenges and limitations of IPD, Smart Contract (SC) solutions for construction and blockchain-BIM 
integration in construction. A theoretical framework was proposed which situates all stakeholders in an 
interconnected loop and in connection with the blockchain-enabled BIM simultaneously. The BIM is shared 
through a Common Data Environment (CDE). The linear flow of information/instruction in this form of IPD is 
also illustrated. Two use case scenarios helped to visualize the applicability of this framework. The first uses a 
BIM-emabled IPD where the shared BIM model facilitates the construction-phase collaboration amongst the 
project team whereby the Project Manager or Lead Consultant acts as a network administrator and changes are 
recorded using Smart Contracts. The second use case deploys SC payment method through BIM monitoring. In 
this case, a Common Pool Resource (CPR) warehouses the funds required for the project and BIM monitoring 
such as IoT-enabled Digital Twin can be synchronized with CPR and SC to trigger payment instructions once 
physical progress corresponds with the pre-coded digital milestone on the BIM model. Although this concept offers 
a solution to streamline construction workflows, further research is required to elucidate on the algorithmic 
workings of smart contract deployment in a BIM-enabled IPD. 
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ABSTRACT: A significant challenge has long persisted in the construction industry: the lack of a robust incentive 
system to encourage and motivate workers to prioritize safety. While safety culture has been recognized as crucial, 
traditional approaches to incentivizing safe behaviours often encounter roadblocks, such as heavy documentation 
processes, recognition delays, and resource allocation difficulties. This paper addresses this problem by 
introducing an innovative approach to incentivize and cultivate a safety culture in the construction industry. 
iSafeincentive integrates blockchain technology and computer vision to develop a novel solution revolutionizing 
safety monitoring and incentive distribution. Computer vision technology is employed for real-time analysis of 
safety conditions based on on-site images, ensuring the immediate identification of safe practices. Simultaneously, 
blockchain technology safeguards the incentive distribution process's integrity and transparency, addressing 
traditional methods' shortcomings. The findings suggest that iSafeincentive offers an efficient and secure method 
for rewarding safe activities among workers. Furthermore, the integrated platform offers a promising pathway to 
enhance job site safety practices, ultimately reducing accidents and incidents within the construction sector. 


KEYWORDS: Safety Culture, Incentive Programs, Blockchain Technology, Computer Vision, Construction 
Industry, Workplace Safety 


1. INTRODUCTION 


The construction industry presents a dynamic workplace where the constant spectre of accidents and injuries casts 
a shadow over operations (Lim et al., 2021; Won and Soo, 2021). A robust safety culture within this sector is 
imperative, where safety transcends mere compliance and becomes an integral core value deeply entrenched in the 
organizational ethos. Such a culture is characterized by a collective commitment to hazard identification, risk 
mitigation, and the seamless integration of safety into all facets of work ((Zou, 2011); (Barg et al., 2014); (Aksorn 
and Hadikusumo, 2008)). However, several critical challenges impede the development of this vital safety culture. 
First and foremost, construction sites often lack effective data management systems and reliable inspection and 
monitoring mechanisms. This deficit hinders the timely identification and rectification of potential 
hazards(Alexander Laufer and G. Jenkins, 1982). Secondly, there exists a dearth of incentive programs designed 
to motivate construction workers to prioritize safety ((Biggs, Sheahan and Dingsdag, 2005)). These programs have 
the potential to significantly reduce accidents and incidents on construction sites, fostering a resilient safety culture. 


It is essential to emphasize the role of management in motivating construction workers to prioritize safety. 
Management must actively incentivize safe practices, linking desired outcomes to performance ((Alexander Laufer 
and G. Jenkins, 1982)). Furthermore, cultivating the right safety knowledge interpersonal skills, and fostering 
appropriate attitudes and beliefs are pivotal in nurturing a positive safety culture within the workforce ((Biggs, 
Sheahan and Dingsdag, 2005)). (Mohammadi, Tavakolan and Khosravi, 2018) provides insights into the 
multifaceted factors influencing safety performance in construction projects. (Helander, 1991) underscores the 
importance of monetary incentives as a catalyst for investing in construction safety. These findings collectively 
emphasize the significance of providing incentives, improving management practices, and shaping workers' beliefs 
and attitudes toward safety. 


Incentives within the construction industry have shown a demonstrably positive impact on safety practices among 
workers ((Zulkefli, Ulang and Baharum, 2014)). They serve as structured mechanisms for recognizing and 
reinforcing safe practices, thereby contributing to amplifying and consolidating safety standards within the 
construction workforce. (Tang et al., 2008) emphasizes the need for incentives in the Chinese construction industry, 
advocating for alignment with project features to enhance project delivery efficiency. (Huang and Sun, 2009) 
delves into various incentive smart contracts and their design principles, while Tinus (2014) highlights concerns 
regarding their impact on work productivity. (Nurul Fieqah and Ahmad Kazimi, no date) shed light on the 
challenges of implementing Occupational Safety and Health Act (OSHA) requirements, which can exacerbate 
administrative burdens and delay incentive distribution. It is crucial to address these issues and develop more 
efficient and streamlined incentive methodologies in construction. These methodologies should alleviate 
documentation burdens and enhance project performance, ensuring a harmonious balance between safety and 
economic sustainability. 
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In summary, while incentive programs in construction can significantly enhance safety performance, their long- 
term effectiveness requires continuous evaluation and strategic resource allocation. Addressing challenges 
associated with traditional incentive methods is essential to boost safety and productivity in the industry. 
Streamlining the monitoring process is critical, as current safety inspections suffer from issues like infrequent and 
inadequate assessments, exacerbated by limited resources and human errors. Moreover, a shortage of safety 
professionals leads to less thorough evaluations. A significant limitation is the absence of a comprehensive 
incentive system aligning with inspections and rewarding safe behavior. 


To effectively address these challenges, this study aims to develop and implement an innovative incentive system 
that harnesses blockchain technology and computer vision. This system is designed to enable real-time monitoring 
and recognition of safe behaviors exhibited by construction workers to comprehensively enhance safety practices 
across various construction sites. 


2. LITERATURE REVIEW 


In recent years, blockchain technology has gained substantial traction within the construction industry, presenting 
novel solutions to address enduring challenges. These studies investigate blockchain technology's adoption and 
potential applications, specifically focusing on its capacity to enhance efficiency, transparency, and safety in 
construction operations. For example, several studies have adopted blockchain for enhancing information 
management in Modular Integrated Construction. (Pan Zhang, 2023) employs game theory to delve into this 
subject, highlighting the pivotal role of diffusion rates influenced by benefits, costs, and government subsidies. 
The study advocates for implementing pilot projects and governmental incentives to facilitate adoption. In another 
study, (Minju Kim, 2023) introduces a blockchain-based system to optimize off-site construction supply chains. 
Using Bayesian updating and incentives, the study aligns contractor and supplier decisions, ultimately improving 
transparency and reliability. 


Another investigation by (Pan Zhang H. W., 2023) employs game theory to analyze the adoption decisions 
surrounding blockchain technology in Modular Integrated Construction. This study echoes the importance of pilot 
projects and government incentives as drivers of adoption. (Hossein Naderi, 2023) introduces a decentralized 
application utilizing blockchain and computer vision to incentivize construction safety through token rewards. The 
application autonomously evaluates safety performance and issues unique Non-Fungible Tokens (NFTs) as 
rewards while maintaining user confidentiality. It presents promising prospects for various domains; however, 
scalability issues and challenges related to individual incentivization must be addressed through further 
development to expand its practical applications. In addition, (Namya Sharma, 2022) provides an exhaustive 
review of 33 global strategies for managing Construction and Demolition waste, focusing on integrating Circular 
Economy principles and lifecycle thinking, particularly in the Indian construction sector. 


In another application, (Wenli Yang, 2022) proposes a master-slave chain model and a hybrid consensus algorithm 
to enhance the efficiency and security of multidomain conversational interactions on a blockchain. It effectively 
manages various scenarios concurrently while maintaining fault tolerance. However, it faces challenges related to 
high capacity demands due to diverse data types, necessitating further exploration of big data verification and 
consistent storage management. Finally, (Liupengfei Wu, 2022) introduces a blockchain-based supervision (BBS) 
model to improve supervision and security in cross-border logistics within modular construction. The model 
employs incentives to encourage data sharing, resulting in enhanced product accountability and data traceability 
compared to centralized platforms. Nevertheless, it encounters limitations, such as the potential for opportunistic 
behavior in data entry and a static incentive mechanism. 


In conclusion, while these studies contribute significantly to our understanding of blockchain adoption and its 
applications in construction and related domains, they collectively share limitations such as theoretical orientation, 
lack of empirical evidence, oversimplified stakeholder models, and the need for further practical validation. 
Addressing these limitations is crucial for advancing the field and ensuring the real-world viability of these 
concepts. 


3. RESEARCH METHOD 
3.1 Process 


To address the objective of this study, the approach is to integrate computer vision technology, which allows for 
the automated analysis of safety conditions from site images. Computer vision eliminates the need for extensive 
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manual inspections and enables more accurate and efficient safety assessments. With reduced human intervention, 
safety assessments become consistent and unbiased, greatly enhancing the reliability of inspection results. 


The integration of blockchain technology, in conjunction with computer vision (Figure 1), presents a 
groundbreaking shift in the construction industry's management of safety inspection data. Blockchain's 
immutability is pivotal in upholding the integrity and permanence of safety inspection records. Once safety data 
is securely recorded on the blockchain, it becomes impervious to alteration or tampering, instilling a high level of 
confidence in the precision and authenticity of the information. This characteristic is paramount in safety 
inspections, as it safeguards historical safety assessment records, reinforcing their credibility for compliance 
verification, auditing, and accountability purposes. Furthermore, the symbiotic fusion of blockchain and computer 
vision technologies fosters enhanced collaboration among various stakeholders involved in construction projects. 
Digital platforms, underpinned by these innovative technologies, enable real-time data sharing and communication. 
This digital transformation empowers project managers, safety officers, construction workers, and regulatory 
authorities to access and comprehensively review crucial safety inspection data promptly. 


Consequently, stakeholders can respond swiftly to emerging safety concerns, coordinate preemptive measures, and 
effectively address potential hazards, ultimately cultivating a safer work environment. This technological 
convergence signifies a pivotal departure from traditional paper-based data collection methods, significantly 
enhancing the efficiency of safety inspection processes. Eliminating manual data entry and paperwork markedly 
reduces the likelihood of errors and omissions, elevating the precision of safety assessments. Furthermore, digital 
platforms support automated data analysis, providing real-time insights and reports. This data-driven approach 
empowers safety officers and managers to make well-informed decisions expeditiously, proactively mitigating 
risks and amplifying overall safety performance. 


Issues and Solution 


Issues to Secure Efficiency and Reliability in Safety Inspection 


¥ Paperwork-based v Fragmentation 


v Site safety conditions a i 
! y da data collection among site personnel 


monitored manually 
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Y Little or no need for human output and process of computer 
intervention vision-based safety inspection 


EA Computer vision Blockchain *2 


How to Possibly Solve the Issues of Safety Inspection 


Fig. 1: The existing condition of safety inspections and suggested technological solutions to mitigate their 
challenges. 


In the contemporary construction industry, a conspicuous challenge resides in inadequate motivation and 
incentives to stimulate safety performance and regulatory compliance among its workforce. This deficiency results 
in a series of interconnected issues that reverberate throughout the construction ecosystem. Site managers, at times 
overwhelmed by their responsibilities and bereft of tangible incentives for meticulous oversight, may inadvertently 
neglect safety protocols or make critical errors. Furthermore, the scarcity of qualified site managers compounds 
these challenges, leading to lapses in enforcing safety measures. Concurrently, the reliability of management 
records concerning safety compliance becomes questionable, diminishing the effectiveness of oversight 


362 


SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


mechanisms. This confluence of factors not only jeopardizes the well-being of construction personnel but also 
raises concerns about the quality and safety of the final built environment. 


In response to these pressing concerns, as depicted in Figure 2, a multifaceted approach is emerging within the 
construction industry, driven by the amalgamation of technological innovation and incentive systems. This 
approach extends from project bidding, wherein contractors can accrue additional points or insurance rate 
discounts for committing to stringent safety and quality standards, to insurance and guarantee providers offering 
reduced premiums as rewards for safety adherence. Notably, the proposal of a token-based incentive system 
underpins this transformation, leveraging blockchain and verification technologies to bolster the adequacy and 
reliability of information generated within the construction milieu. This incentive-driven paradigm fosters 
voluntary safety activities among all stakeholders, from equipment and material suppliers to structural consultants 
and safety inspection agencies. By incorporating bottom-up perspectives, this holistic shift aspires to invigorate 
safety culture within construction, ensuring that every participant is vested in the collective goal of elevating safety 
standards and mitigating risks across the industry. 


Current approach to safety inspection Paradigm shift 
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Fig. 2: Token-based incentive system 
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Furthermore, utilizing blockchain-based evaluations introduces a novel incentive structure that encourages the 
voluntary participation of managers and workers in safety activities. Rewards are distributed following the 
evaluation results, directly linking safety performance and tangible incentives. This incentivization model can 
significantly enhance safety awareness and engagement among construction personnel. In addition to rewards, the 
system can incorporate mechanisms such as additional bidding points, safety ratings, and reductions in insurance 
fees, all contingent upon the evaluation scores of the workforce held by companies. This multifaceted approach 
promotes safety at individual and organizational levels. It aligns safety objectives with broader project and 
financial considerations, making it a comprehensive and effective strategy for improving safety conditions in the 
construction industry. 


3.2 Applications 
3.2.1 Data management 


The process of safety condition analysis from on-site images is a multifaceted procedure that seamlessly integrates 
automated image analysis through computer vision and robust data management, all while leveraging the security 
of blockchain technology. This comprehensive process involves regular and irregular inspections facilitated by 
deep learning-based models and cloud-based computing resources. In the first step, regular safety inspections are 
conducted as scheduled assessments of construction sites. Various devices, such as smartphones or cameras, 
capture images during these inspections. Subsequently, the captured images are uploaded to a cloud-based system 
for further analysis, combining the power of computer vision for real-time safety assessment. As shown in Figure 
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3, once images are uploaded, the inspection process is initiated through user queries and system management. 
Users interact with the system to initiate inspections based on predetermined intervals or specific triggers, ensuring 
that safety conditions are consistently monitored using computer vision technology. The system, in turn, effectively 
manages the entire inspection process, overseeing data collection, analysis, and database management, all while 
benefiting from the transparency and security of blockchain integration. This systematic approach ensures that 
inspections are conducted regularly and streamlines the overall process, reducing delays and improving safety 
outcomes. A key aspect of this procedure lies in data management, further enhanced by blockchain technology. 
Within the cloud-based system, various types of data are meticulously handled. This includes system management 
data to maintain the integrity and functionality of the inspection system, project data to manage project-specific 
details, and user data to regulate access rights and profiles. Inspection data remains at the core of this process, 
involving records of all inspections and associated metadata such as timestamps, geospatial information, media 
types, and automatic extraction of relevant metadata from the images. Blockchain technology ensures the 
immutability and transparency of these critical data records, providing a secure and tamper-proof foundation for 
the entire safety analysis and incentive distribution process. 
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Fig. 3: Blockchain & Computer Vision Integration 


3.2.2 Incentive mechanism 


The token distribution process within the iSafeincentive platform, facilitated by blockchain technology, is 
characterized by a systematic sequence of actions. Commencing with continuous monitoring by the platform's AI 
detectors, the focus is assessing job site activities, including adherence to Personal Protective Equipment (PPE) 
regulations and safe conduct. Subsequently, vetted data undergoes scrutiny and validation to ensure accuracy and 
reliability. At the core of this process are smart contracts, meticulously crafted with predefined criteria and rules. 
These smart contracts serve as the automation engine, enabling the precise allocation of tokens in response to 
identified safe activities. Tokens are directed to the workers' digital wallets, tightly linked to their unique 
blockchain identities. Simultaneously, each transaction is recorded within the blockchain ledger, a fandamental 
feature that underpins transparency and traceability. 
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Worker notification follows promptly, serving dual purposes: acknowledging the safe behaviors observed and 
motivating continued adherence to safety protocols. Furthermore, this approach establishes an efficient and 
transparent record-keeping system. Figure 4 represents the utilization of blockchain technology to streamline the 
token distribution process. This systematic and secure procedure inspires trust among all stakeholders, enabling 
workers to employ their tokens in various capacities while cultivating a resilient safety culture within the 
construction industry. 
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Fig. 4: The procedure for transmitting tokens once the safety measures have been verified through blockchain 
technology. 


4. CONCLUSION 


In conclusion, safety culture is vital in the construction industry, significantly influencing safety performance and 
outcomes. Empirical evidence supvorts the importance of a safety culture in reducing accidents and incidents 
within construction organizations. Motivating construction workcrs to prioritize safety through incentive programs 
has been crucial. Incentives encouraged safe behaviors, recognized individual efforts, and fostered a collective 
sense of responsibility for safety. However, as discussed earlier, current incentive methodologies face challenges, 
including burdensome documentation requirements, delays in distribution, and resource allocation issues. 
Streamlining documentation processes, reducing distribution delays, and optimizing resource allocation have been 
essential steps to enhance the effectiveness of incentive programs. 


In this study, iSafeincentive has been developed by integrating computer vision and blockchain to automate safety 
assessments, improving accuracy and efficiency and ensuring the integrity and immutability of safety records and 
incentives. Blockchain-based incentives have linked safety performance to tangible rewards, enhancing safety 
awareness and engagement among construction personnel. This multifaceted approach has aligned safety 
objectives with broader project and financial considerations, demonstrating its potential as a comprehensive 
strategy for improving safety conditions in the construction industry. 


The safety condition analysis process from on-site images combined automated image analysis with systematic 
inspection strategies, ensuring regular and reliable safety assessments. Token distribution through blockchain 
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technology followed a systematic sequence, promoting safe behaviors and cultivating a resilient safety culture 
within the construction industry. Overall, these innovations held great potential for enhancing safety and 
productivity in construction. 
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MULTI-ASPECTUAL KNOWLEDGE ELICITATION FOR 
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ABSTRACT: Efficient optimization of business processes required a profound understanding of expertise 
provided by domain specialists. However, extracting such insights can indeed be a laborious and time-consuming 
endeavour. This paper introduces the Multi-Aspectual Knowledge Elicitation framework (MAKE4ML) — a novel 
approach designed to effortlessly and effectively extract valuable information from domain experts. This 
framework inherently facilitates the development of machine-learning models capable of optimizing business 
processes, thereby diminishing reliance on experts. The framework's application within a food warehouse 
company is showcased, specifically targeting the enhancement of the procurement process. The employed 
methodology revolves around conducting comprehensive interviews with procurement experts, thereby enabling 
a meticulous exploration of diverse facets inherent to a business process. Subsequently, the gathered insights are 
employed to conceive and calibrate a machine learning model (time series forecasting). This model effectively 
emulates the domain experts’ proficiency, offering invaluable decision-oriented insights. The outcomes of this 
study show that our framework allows efficient knowledge elicitation, which is a pivotal factor in formulating and 
deploying a bespoke machine-learning model. The proposed approach can be extended into various other 
business processes, thereby paving the way for operational refinement, cost reduction, and amplified efficiency. 


Keywords: domain experts, knowledge elicitation, multi-aspects, machine learning, procurement optimization, 
warehouse, technology acceptance. 


1. INTRODUCTION 


The growing demand for digitalization and process optimization has led to the integration of machine learning 
(ML) models across industries. This integration often requires insights from domain experts, necessitating the 
extraction of pertinent information to design tailored ML models. Researchers and ML engineers have employed 
various techniques, such as feature selection and knowledge elicitation, to enhance model accuracy while ensuring 
successful technology adoption. Studies have highlighted the challenges of knowledge elicitation, which 
significantly affect ML performance across disciplines. Researchers are increasingly exploring human 
involvement in ML workflows (D’Angelo & Palmieri, 2020; Park et al., 2023; Sundin et al., 2022; Wang et al., 
2021), combining expert knowledge with data from diverse sources (Ademujimi & Prabhu, 2021; Ben Brahim et 
al., 2022; Hu et al., 2019; Huang et al., 2019; Lee et al., 2020; Seymoens et al., 2019), and innovative ways to 
extract insights (Afrabandpey et al., 2019; Campos et al., 2018; Cheung et al., 2011; Crerie et al., 2009; El-Assady 
et al., 2020; El-Assady et al., 2019; Mantik et al., 2022; Mozina et al., 2018; Park et al., 2021; Yazici et al., 2022; 
Young et al., 2022). 


First, human involvement in ML workflows referred to as "Human-in-the-loop", aims to create cost-effective 
prediction models by incorporating human knowledge during data preparation and refinement stages. Secondly, 
the integration of expert-derived knowledge with data from sources like sensors refines training objectives and 
contextual alignment, as standard sensor data might lack external factors' consideration. Finally, Intuitive 
techniques (e.g., decision-mining, process mining) bridge gaps between ML engineers and multi-disciplinary 
experts, translating meaningful insights into ML model specifications. 


The central research question is: "How can we extract meaningful knowledge from domain experts for designing 
ML models while ensuring user acceptance?" To address this, a comprehensive framework is proposed, involving 
multi-aspectual knowledge extraction, translation into ML specifications, visualizing business workflows, and 
capturing decision-making rules and constraints. Key contributions include a multi-disciplinary knowledge 
extraction framework, translating knowledge into ML and software specifications, and visualizing business 
workflows and decision rules. The framework's efficacy is demonstrated in a warehouse setting, focusing on 
procurement. Experimental results reveal the successful extraction of diverse expert knowledge. 
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2. RELATED WORK 


2.1. Human Involvement in Machine Learning Workflow 


In recent years, there has been a growing interest in human involvement in the machine-learning workflow. The 
use of human-in-the-loop techniques has been proposed to improve the performance and reliability of machine 
learning models. Several studies have investigated how human expertise can be used to improve the performance 
of machine learning models. 


In the field of data science, (Wang et al., 2021) introduced AutoDS, an automated machine learning system that 
aims to support data science projects by automating tasks such as data exploration, model training, and model 
selection. This system proposes suggestions (ML configuration, pre-process data, etc.) to the users via a web- 
based graphical interface where they can interact and make amendments. They showed that the proposed system 
improved the productivity of ML workflow while delivering better models. 


In the field of aerospace systems, (D’Angelo & Palmieri, 2020) proposed the use of genetic programming to 
extract knowledge from aerospace structural defects by providing a mathematical model of the defects, which can 
be used for recognizing other similar ones. They found that their approach was effective in building reliable 
models of the defects and can be considered a successful option for building the knowledge needed by tools for 
controlling the quality of critical aerospace systems. 


(Sundin et al., 2022) proposed a principled approach to use human-in-the-loop machine learning to help chemists 
adapt the multi-parameter optimization (MPO) scoring function to better match their goal. They proposed a 
method that uses a probabilistic model that captures the user’s idea and uncertainty about the scoring function and 
uses active learning to interact with the user. They showed the effectiveness of their approach in two simulated 
examples achieving significant improvement in less than 200 feedback queries. 


Overall, these studies demonstrate the potential of a human involved in the ML workflow to improve the 
performance and reliability of machine learning models. However, further research is needed to understand the 
best ways to incorporate human expertise into the machine-learning process, and how to effectively balance the 
trade-offs between automation and human involvement. 


2.2. Fusion-driven learning 


The fusion-driven learning consists of the fusion of knowledge experts with data collected from other sources to 
improve the performance of machine learning models. Indeed, several works have been introduced to leverage 
the strengths of both human expertise and data-driven methods to create more accurate and reliable models. The 
most representative works are discussed hereafter (Ademujimi & Prabhu, 2021; Ben Brahim et al., 2022; Hu et 
al., 2019; Huang et al., 2019; Lee et al., 2020; Seymoens et al., 2019). 


(Huang et al., 2019) propose a hybrid approach for identifying the structure of the Bayesian network (BN) for the 
threat assessment of mass protests. They demonstrate that traditional methods for discovering BN structure from 
data or experts were inadequate, and instead proposed a hybrid approach (ISM-K2) which enhanced the BN 
structure learning methods via a knowledge elicitation method called ISM (Interpretive Structural Model). 


(Ademujimi & Prabhu, 2021) introduced a method for fusion-learning of Bayesian network (BN) models for fault 
diagnostics. They proposed an approach for expert knowledge elicitation of the BN structure aided by logged 
natural language data and sensor data. They found that the resulting fused BN model improved diagnostics as it 
had a wider fault coverage than the individual BNs. 


(Hu et al., 2019) developed a methodology that combines sensor data with domain expert knowledge to improve 
energy fault detection. The proposed methodology includes an engagement process with experts in the energy 
system field to identify relevant data, an integration of domain knowledge with sensor data, an automatic selection 
of potential input data, and the use of machine learning to automatically build a data-driven fault detection model. 


(Lee et al., 2020) presented an interactive machine-learning approach to improve the assessment of rehabilitation 
exercises by integrating a data-driven model with expert knowledge. This approach uses reinforcement learning 
to identify the most salient features of the exercise motions and generates a user-specific analysis to elicit feature 
relevance from a therapist for a personalized rehabilitation assessment. This study improves the performance of 
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predicting assessment and demonstrates how machine-learning models can improve with expert knowledge for 
personalized rehabilitation assessment. 


Overall, these studies demonstrate the potential of fusing knowledge from experts with data collected from other 
sources to improve the performance of machine learning models. However, further studies are needed to find 
techniques and methods to easily and efficiently fuse data used to train ML models while achieving the best 
performance. 


2.3. Knowledge Elicitation Methods 


There have been several studies in the past that have aimed to improve the efficiency and effectiveness of machine 
learning models through the incorporation of expert knowledge. These studies (Afrabandpey et al., 2019; Campos 
et al., 2018; Cheung et al., 2011; Crerie et al., 2009; El-Assady et al., 2020; El-Assady et al., 2019; Mantik et al., 
2022; Mozina et al., 2018; Park et al., 2021; Yazici et al., 2022; Young et al., 2022) have proposed various methods 
for extracting and utilizing expert knowledge: active learning, process mining and decision mining, and human- 
in-the-loop approaches. 


One widely used approach is active learning, where a model is trained on a small initial labelled dataset and then 
iteratively queries the expert for labels on the most uncertain samples. (Mozina et al., 2018) propose a data-driven 
tool for the semi-automatic identification of typical approaches and errors in student solutions for a programming 
course. They used the argument-based machine learning (ABML) method, which interactively exchanges 
arguments with an expert until the model is good enough. Similarly, (El-Assady et al., 2019) present a framework 
that integrates speculative execution, allowing users to preview the potential consequences of their actions with 
the model and make more efficient decisions. 


Another approach uses process mining and decision mining to identify operational processes, viz., business rules. 
Indeed, (Campos et al., 2018) applies a decision-mining technique in an event log of a real company to discover 
tacit decisions that could be translated as business rules. In the same way, (Crerie et al., 2009) relies on process 
mining and data mining techniques to extract two sub-types of business rules: condition action assertions and 
authorization action assertions. Likewise, (Alkofahi et al., 2022) introduces a method to elicit business rules from 
real-world web applications; these rules are defined as one-to-one and one-to-many implicit dependency 
relations, thus minimizing the negative effect of substitute relations in decision-making. 


A third approach relies on the concept of human-in-the-loop, (Afrabandpey et al., 2019; El-Assady et al., 2020; 
Park et al., 2021) whereby human experts are added into machine learning pipelines, allowing them to provide 
feedback or guidance at various stages of the model development process. For instance, (Park et al., 2021) describe 
a framework called “Ziva” that guides domain experts in sharing their knowledge with data scientists for building 
natural language processing (NLP) models. (Afrabandpey et al., 2019) introduce a method to elicit expert 
knowledge about pairwise feature similarities and use sequential decision-making techniques to minimize the 
effort of the expert while improving the prediction performance on a small dataset. (El-Assady et al., 2020) has 
developed a framework allowing users to provide semantics of their knowledge, which will contribute to topic 
model refinement. 


Finally, (Yazici et al., 2022) performs knowledge prioritization after the elicitation from domain experts. The 
authors use knowledge elicitation and feature selection techniques to identify the most prevalent tacit knowledge 
variables, which are then prioritized using machine learning methods and the fuzzy Analytic Hierarchy Process 
(AHP). 


Overall, various studies have proposed methods that allow the intuitive extraction of knowledge from experts and 
train optimized machine learning models. Although these methods allow knowledge elicitation, there are several 
areas and aspects that have not been (or have been poorly) considered so far. In this work, we aim to introduce a 
framework that considers the multi-aspect of concepts defining the context, their interdependence and translation 
into tailored specifications. 
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3. METHODOLOGY 
3.1. Multi-aspectual Knowledge Elicitation (MAKE) 


MAKE, multi-aspectual knowledge elicitation, developed by Winfield (Winfield, 2000) for planning and building 
knowledge-intensive systems, is based on Dooyeweerd's aspects (Table 1). By guiding and stimulating the 
participants to identify aspects that are important to their situation and opening up their constituents, MAKE 
begins with the most obvious aspects and gradually uncovers the relevance of each. As Winfield (Winfield, 2000) 
found, MAKE stimulated the participants to consider broader issues, lay participants were able to grasp the 


meaning of aspects and work with them during analysis, and some tacit knowledge was explicated through 
MAKE. 


(Winfield, 2000) developed two visual tools to help multi-aspectual analysis. One employs a flexible mind map 
to build up an understanding of inter-aspectual relationships. The second method employs the Christmas Tree, 
designed to provide an overall picture of areas of concern that emerge during discussions. Any significant positive 
or negative repercussion that emerges can be ‘hung on' the tree at the aspect in which it is meaningful, with the 
positive on one side and the negative on the other. As the picture develops, patterns emerge showing areas of 
significant benefit or problems, which can be clarified and tackled during the design and development process. 


3.2. Proposed Framework 


The framework in (Fig. 1) relies on the 15 aspects of Dooyeweerd (Basden, 2011) (Table 1) used in MAKE 
(Winfield, 2000) to elicit knowledge from domain experts through interactions, which allows an understanding of 
what is meaningful to them. Indeed, it uncovers the elements that are often not immediately apparent but contribute 
significantly to overall technology acceptance and success while avoiding unintended consequences. Our 
proposed framework consists of five key steps. 
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Fig. 1: Overview of the proposed Multi-Aspectual Knowledge Elicitation framework. 


Table 1: fifteen aspects of Dooyeweerd (Basden, 2011) and their meaning 


Aspect Meaning 
Quantitative Discrete amount 
Spatial Continuous space 
Kinematic Movement 
Physical Energy + mass, forces 
Biotic/Organic Life functions + organisms 
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Sensitive/Psychic Sense, feeling, emotion 
Analytical Distinction, conceptualization 
Formative Achievement, construction, history, technology 
Lingual Meaning carried by symbols 
Social ‘We’: relationships, roles, convention 
Economic Frugal management of resources 
Aesthetic Harmony, play, enjoyment 
Juridical Due: responsibilities + rights 
Ethical/Attitudinal Self-giving love, generosity 
Pistic/Faith Vision, aspiration, commitment, belief 


(1) Identification of business processes: analyse the company's internal/external processes to identify multiple 
cross-functional processes, data points, systems, and non-value-added operations. This step is essential to the 
identification of key data feeds (internal/external) favourable to the collection of intelligence to guide decision- 
making. 


(2) Identification of critical success factors: gather insight from senior members and staff users of the company 
to understand the business and contextual issues. This step relies on a series of interviews where the discussion 
could turn around topics like system usefulness, job security, the impact of the technology on users and their work, 
user’s attitudes to technology, skill levels and other factors that are meaningful to users. 


(3) Visual representation of extracted concepts: take concepts from step (2) and map them against 
Dooyeweerd’s aspects (e.g., technical, social, economic, ethical, etc.) to identify any gap or missing concept that 
will require further investigations (or interviews). The visual representation of extracted concepts highlights any 
laws, axioms, data, definitions and constraints that apply to the domain of the project. 


(4) Identification of tangible and non-tangible factors, extra meta-level rules and parameters: provide a 
domain conceptualisation and presentation to an expert including different aspectual views to select the aspectual 
view(s) in which experts see their domain expertise lying. This step goes through the loop of detailed knowledge 
acquisition to identify business process workflows and decisions making scenarios. 


(5) Integration of insights into the company business processes: propose a specification and design of the 
software solution to be integrated in the company information system to improve and overcome the existing 
limitations or challenges. 


3.3. Knowledge Elicitation: application 


The proposed framework relies on a series of interviews with domain experts or managers who have strong 
knowledge and understand the business processes. The application of this framework to a business starts with 
“Tutorial” interviews (Winfield, 2000), where the expert is asked to prepare a talk outlining the whole domain. , 
This helps provide an orientation to a domain and the identification of relevant concepts. The interviews are 
carried out with senior managers or team leaders who can explain the daily activities to a non-expert interviewer. 
As a result of these interviews, the interviewer should come up with internal/external processes which can impact 
the company’s objectives. 


The next step of the framework aims to identify critical factors which contribute to the success of the business 
processes. To achieve this, a “Focused” interview (Winfield, 2000) is carried out between an interviewer (ML 
Engineer) and domain experts to extract more detailed knowledge. This interview consists of three parts. First, 
there is an introduction where goals are explained to encourage the expert to take part in the discussion. Secondly, 
a set of topics is carefully chosen regarding previously identified concepts. These topics guide the interviewer to 
identify what is meaningful for experts (future users). Finally, the interviewer needs to evaluate and summarise 
the elicited knowledge before the interview ends. 


The concepts, collected during the Tutorial interviews and Focused interviews, are mapped against the fifteen 
aspects of Dooyeweerd (Basden, 2011). Indeed, it consists of the analysis of each concept to determine if it can 
be defined or interpreted by these aspects. A set of keywords can be considered as references when analysing each 
concept. A keyword can represent an entity, a process, a task, or a system. As a result, the elicited knowledge can 
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be visually illustrated and structured with the following parameters: laws, axioms, data, definitions, and 
constraints. 


After the extraction of knowledge, the latter needs to be conceptualised to the domain and presented to an expert 
for validation. This step helps narrow down the knowledge and identify tangible & non-tangible factors, rules and 
constraints involved in the decision-making process. If the validation failed, the ML engineer needs to organize 
new interviews to clarify the misalignment or collect what is missing from the expert’s knowledge. 


Finally, the ML engineer relies on elicited knowledge to propose an ML design: dataset structure (features and 
observations) and the training objectives. Indeed, it helps in the creation of a multi-variant dataset that was used 
to train a time series model for stock forecasting purposes. Moreover, the collected knowledge guides the 
definitions of specifications for the software development part of the project. Indeed, it shaped the definition of 
models, database tables, workflows and wireframes (UX/UI). 


4. RESULTS & EVALUATION 
4.1. Case study 


In this study, we worked with a wholesale company specializing in food export and distribution across the North 
UK. It operates in a multi-disciplinary environment, where teams from different disciplines work together to 
achieve the company’s objectives. The company has several business processes, such as sales, procurement, 
logistics, accounting, warehouse, e-commerce, etc. that guide its daily activities and contribute to its success. 


In a warehouse context, a procurement is a business process that involves identifying and selecting suppliers, 
negotiating contracts, and managing the purchase of food items to maintain inventory levels, meet customer 
demand and optimize costs. We applied our proposed method to the procurement business process, where we 
extracted knowledge from the team members and design machine learning models. The data and knowledge 
collected from the company’s operations were used to train ML models, which were then deployed to support the 
procurement process. 


Moreover, the company owns a bespoke resource management platform that supports various operations such as 
raising and amending purchase orders, stock management, goods-ins and quality control. The trained ML models 
were integrated into this platform to support the procurement process while offering a technology acceptance by 
the staff and final users. 


4.2. Procurement: Elicited Knowledge 


(1) Identification of business processes: the company’s business model involves several processes, from 
ordering to delivery, which aims to meet the supply-demand needs. Interviews have been conducted with the 
company staff to better understand the existing processes and come up with a supply chain map (Fig. 2). 
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Fig. 2: Supply Chain Map 
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(2) Identification of critical success factors: we selected the characteristics/factors that will contribute to the 
technology acceptance of the new artificial intelligence (AI) platform with regards to the procurement business 
process. 


(3) Visual representation of extracted concepts: several concepts have been identified during the interviews 
with the domain experts. From the procurement concept, we investigated and captured knowledge in terms of 
laws, axioms, data, definitions, and constraints. The following keywords have been selected as reference for our 
investigation: product, supplier, purchase manager, carrier, reference for quotation (RFQ), purchase order, 
manufacturing, delivery, return, and credit note. Table 2 illustrates the knowledge elicited from the entity 
“Product” and mapped against the 15 aspects of Dooyeweerd. 


(4) Identification of tangible and non-tangible factors, extra meta-level rules and parameters: following the 
mapping of the key business processes against the Dooyeweerd’s aspects, we identified data attributes and 
workflows required to design a ML solution. Indeed, the work sessions with experts from the procurement domain 
allow us to come up with tacit knowledge that is illustrated in an activity diagram (Fig. 3) 


(5) Integration of insights into the company business processes: the knowledge elicited from previous steps 
allowed us to gather all specifications required to properly design databases of micro-services that will be part of 
a new software architecture in the company. Moreover, the extracted knowledge allows us to create interface 
insights (wireframes) illustrating each activity of the business process. Indeed, these wireframes (Figure 4) allows 
us to quickly validate our understanding of the business processes and ensure the technology acceptance of the 
future users. 


Table 2: Multi-aspectual knowledge elicited from an entity “Product” 


Laws Axioms Data Definitions Constraints Aspects 
Weights and A product must have | Product quantity, | A product has 
Measures Act a measurable size, weight, cost | properties which 
1985 quantity price, sales price, | can take a discrete 
online price, amount: quantity, (1) Quantitative 
; i i A product should fit 
online offer size, weight, price 
. f with the warehouse 
price, collection 
: shelves 
price 
dimensions/capacity 
Food Safety A product must have 
. Product 
Act 1990 a physical presence A product has a 
. . . dimensions, PA (2) Spatial 
(Food Safety in a specific location shape, position, 
location 
Act 1990) 
Organic A product needs to be 
Products sold before the 
Regulations expiration date. 
2009 (The 
A product must be ae? 
Organic e i A product expiration 
kept in a suitable Product . 
Products . . sme: A product has a date needs to fit with the ait A 
environment with expiration date + . . . (5) Biotic/Organic 
Regulations i life function shelf life 
respect of the shelf Shelf life 
2009), Food i 
life 
Safety Act 
1990 (Food 
Safety Act 
1990) 
General Food A product must be | Product Product can be 
Law stored and | temperature, touched, smelled E . 
i w) (6) Sensitive/ Psychic 
Regulation transported under | humidity and & tasted 
conditions that | light 
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(EC) No 


maintain its sensory 


178/2002 quality 
Food Safety A product must be A product must have a 
Act 1990 analysed and set of chemical and 
pH, nutritional Product can be 
(Food Safety evaluated for its . EN physical properties (7) Analytical 
Act 1990) BEEN bd content, shelf life | distinguished 
physical properties 
Food Standards | A product must be Product Package Some product Product rebranding 
Act 1999 capable of packages are designing should fit 
undergoing designed to meet with each market (8) Formative 
processing or customer segment 
transformation expectations 
Food Safety A product must meet | Product name, Product labelling Product name & code 
Act 1990 industry standards description, code | using symbols need to follow standards 
(Food Safety and regulatory (length, symbols, (9) Lingual 
Act 1990) requirements for languages) 
labelling 
Sale of Goods A product must be Product margin Product has a Product has to be 
Act 1979 priced in a manner benefits, limited value managed with frugality 
(11) Economic 
(Food Safety that reflects its value | profitability and 
Act 1990) growth 
Food Standards | A product must be Customers Product has to A product needs to 
Act 1999 consistent with reviews, bring joy, fun, and | satisfy the customers so i 
customer feedbacks harmony to that they get values for CaA sauce 
preferences customers what they pay for 
Food Safety A product must Product reward, Product has to A product needs to be (13) Juridical 
Act 1990 comply with relevant | recompense bring justice sold in a fair ways 
laws 
Food Safety A product must be Product Product can be A product can be (14) 
Act 1990 produced and advantages, beyond the delivered earlier, Ethical/Attitudinal 
marketed with benefits imperatives discounted 
respect of ethical 
principles 
Food Safety A product must be Dietary A product follows | A product needs to be (15) Pistic/Faith 
Act 1990 produced and restrictions, acommitment and | trustworthy 
distributed with consumer trust 
respect of spiritual preferences 


and cultural beliefs 
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Fig. 3: Activity Diagram — Make a purchase order 
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Fig 4: Procurement (Purchasing) module: this wireframe shows a quick representation of the overview page. 


4.3. Evaluation of the results 


The proposed framework was evaluated in a real-life warehouse environment where knowledge have been 
extracted from procurement experts and used to build historical sales datasets. These datasets were used to train 
time series forecasting models that optimize the procurement business process. We conducted the evaluation in 
two ways: a quantitative analysis and a qualitative analysis. 


Quantitative analysis: to quantify the performance of the proposed method, we conducted a series of experiments 
which consists of training stock forecasting models using the datasets generated from historical sales data. These 
datasets were also enriched with knowledge elicited from procurement experts using our framework. These 
methods helped to identify the learning objectives and the representation of the dataset (features, observations, 
etc). We used two time series forecasting methods like ARIMA (Harvey, 1990) & TFT (Lim et al., 2020) from 
the literature to illustrate how our framework contribute in improving the performance of the models. To evaluate 
the performance of our models we used the following metric: Quantile loss (Wen et al., 2018). Table 3 shows the 
performance of the models trained on a dataset without elicited data (D1) and a dataset with elicited data (D2). 
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Table 3: Comparison of models trained on two datasets, without (D1) and with elicited features (D2), 
w.r.t a 0.5 percentile quantile loss (p50 loss) and 0.9 percentile quantile loss (p90 loss) 


Datasets Elicited Features Model p50 loss P90 loss 
D1 - ARIMA 1.9929 1.9451 
D1 - TFT 0.6138 0.4266 
D2 Yes TFT 0.5825 0.3780 


Each dataset has the following settings: 513484 time points (about 2 years of sales data), 30 days’ horizon, and 
the stock quantity as the target feature. 


Moreover, we did a comparison of models trained on datasets, which involved features extracted with knowledge 
elicitation techniques considered as baselines (ABML, IHTM, Ziva). We used the same time series-forecasting 
model (TFT) to ensure a fair comparison (Table 4). 


Table 4: Comparison models trained on datasets generated with our proposed knowledge elicitation framework 


against baselines. 


Datasets Elicited Methods Model p50 loss P90 loss 
D2 IHTM TFT 0.6827 0.4702 
D2 Ziva TFT 0.6764 0.4629 
D2 Ours TFT 0.5825 0.3780 


Qualitative analysis: In order to evaluate the technology acceptance of solutions developed using our knowledge 
elicitation framework, we selected seven participants. This group included three individuals without prior 
knowledge in procurement and four members of the procurement team. The main objective was to determine 
whether the trained models effectively optimized business processes while considering the needs of the end user. 
Each participant was tasked with creating a purchase order on the system while adhering to two key constraints: 
avoiding stock shortages and preventing overstocking. The participants were instructed to evaluate the system 
based on several criteria, including user experience (UX), user interface (UI), workflow simplicity, and knowledge 
awareness. They rated the system on a scale of 0-5 (bad to good) for each criterion. 


In terms of user experience (UX), feedback from seven participants revealed a generally positive response to the 
system for creating or modifying purchase orders. Five participants rated the experience with a score of 5 out of 
5, indicating satisfaction, while two gave a score of 4, suggesting a desire for added features like shortcuts. 
Regarding the user interface (UI), six participants praised the new design with a score of 5, although one 
participant gave a score of 3 due to colour preferences. Evaluating workflow simplicity, three participants without 
procurement expertise rated it 4 for ease of following step-by-step instructions. In contrast, four procurement team 
members rated it 5 for consistency and accuracy. In terms of knowledge awareness, four participants rated it 5 for 
facilitating decisions on quantity, delivery, pricing, and supplier selection, while three desired more empirical data 
to bolster the system's recommendations. 


5. CONCLUSION 


In this paper, we proposed a multi-aspectual knowledge elicitation framework (MAKE4ML) for optimizing 
business processes through the design of machine-learning models. Our approach involves conducting interviews 
with domain experts and parameterize machine-learning models that can reproduce the expertise of the experts 
and provide insights for decision making. We applied the proposed framework in a food warehouse company to 
optimize the procurement process, resulting in a significant improvement in the accuracy of forecasting. 


This framework allows us to extract concepts that were relevant to the business and useful to optimize the learning 
objectives of the machine learning models. Our approach can be extended to other business processes, enabling 
efficient knowledge elicitation, and contributing to the design of machine-learning models that can optimize 
operations, reduce costs, and increase efficiency. 


Furthermore, we plan to investigate the combination of multi-aspectual knowledge elicitation techniques with the 


active learning. Active learning has been shown to be effective in reducing the amount of labelled data required 
for training machine learning models. We believe that combining active learning and the multi-aspectual 
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knowledge elicitation technique MAKE4ML can lead to even more efficient and effective optimization of 
machine learning models. 


Overall, our multi-aspectual knowledge elicitation framework can be a valuable tool for optimizing business 
processes through the design of machine-learning models. By leveraging the knowledge and expertise of domain 
experts, we can develop more effective machine learning models that can lead to cost savings, improved 
efficiency, and better decision-making. We hope that this paper provides a valuable contribution to the field and 
inspires further research in this area. 
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ABSTRACT: The theme of ‘Managing the digital transformation of the construction industry’ emphasises the 
importance of considering various dimensions of digitalisation and optimising the built environment. This review 
aims to present methodological approaches from existing literature that elucidate location-related factors 
impacting the capital cost of data centres. These findings facilitate adjustments to historical cost data when 
estimating total costs for new data centres. A systematic literature review method was employed to ensure an 
objective and comprehensive synthesis. In conjunction with Bayes's theory, this review identifies that a Delphi 
methodology is the most suitable methodological approach for forecasting and modelling capital expenditure for 
hyper-scale data centres. The methodology enables collective decision-making and consensus building, 
recognising the stakeholder's pivotal role in shaping the future of data centres. These findings offer valuable 
insights for researchers and practitioners in forming a methodological approach for further investigations into the 
location-related factors impacting the capital cost of data centres. Embracing this knowledge allows us to align 
research and practice, ensuring that these practices become integral to shaping the future of data centres and the 
digitalisation and optimisation of the built environment. 


KEYWORDS: cost; decision analysis; forecasting, data centres 


1. INTRODUCTION 


The rapid expansion of digital technologies requires buildings (called Data centres) to house information 
technology (IT) equipment to store and process data and services required by digital transformation, including the 
internet. Due to the advantages such as advanced technological progress in the sector and the cold climate 
conditions, certain regions of the world, such as the Nordic regions, are preferred by investors to build Data centres. 
This presents unprecedented challenges to construction cost consulting professionals in providing reliable capital 
cost estimates as early as a potential (international) location is identified. In the very early stage of a project 
opportunity, cost consultants provide capital expenditure input to support development appraisal exercises which 
estimate the residual land value and input to the Order of Cost estimate involved ‘in determining the possible cost 
of a building(s) in relation to the employer’s fundamental requirements’ (RICS, 2013). 


As these activities occur before preparing a complete set of working drawings (RICS, 2013), capital expenditure 
is estimated by benchmarking cost data from previously completed similar projects. This involves comparing and 
contrasting the difference between historical and proposed projects concerning the cost-significant variables such 
as location, building size, market conditions and their impact on capital expenditure. Existing literature reveals 
generic cost modelling approaches that could be used in early cost estimates and details of cost-significant 
variables that need to be considered during cost modelling (Parameswaran et al., 2019; Hashemi et al., 2020). 


However, as data centres are relatively new to the construction sector and their design and construction 
significantly depend on the location (King et al., 2023), the suitability of the generic cost modelling approaches 
has yet to be widely investigated. Therefore, particularly regarding the conference theme and the growth of the 
internet, more research is required to establish the impact of site location on the capital expenditure of hyper-scale 
data centres; this will assist in selecting the correct location to make informed decisions and reduce the financial 
risk and contingency estimate to ensure a more accurate construction cost. This paper aims to present findings of 
a systematic literature review to determine the theoretical and methodological approaches in existing literature 
concerning the location-related factors affecting the capital cost of data centres that could be used to adjust 
historical cost data during their use in estimating the total cost for new data centre projects. 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


2. MATERIALS AND METHODS 


2.1 Approach 


A systematic approach has been used to identify and synthesise the literature results to ensure an accurate, unbiased 
synthesis. It is an approach where literature on a complex topic has been conceptualised and studied differently 
among researchers (Greenhalgh et al., 2005). This review identifies methodological approaches, geographies, 
historical development, quality, and literature validity. 


2.2 Scoping Strategy 


The literature search strategy utilised a scoping review based on that as derived from PRISMA (Tricco et al., 2018) 
and to provide rigour to justify further research (McInnes et al., 2018). The search strategy used the advanced 
search tool with Boolean keyword operators. In total, 1,375 studies were identified. After an initial review of the 
abstract of the papers, 508 were identified as being focused on construction, data centres and cost variables. From 
those identified as suitable, 87 were identified as duplicated, reducing the number of papers for review to 421. As 
Suarez-Almazor et al., (2000) suggested, it is vital to utilise a second database to identify potential inconsistencies. 
In addition, it may further enhance and support the literature review with newly identified literature. Using the 
same search criteria as the stage 1 search, a further 1,623 studies were identified; after an initial review of the 
abstract of the papers, 402 were identified as being focused on construction, data centres and cost variables. From 
those identified as suitable, 251 were identified as duplicated from the initial stage 1 search, further reducing the 
number of papers for an abstract review to 151, bringing the total for abstract review to 572. Following an abstract 
and full text review a total of 161 studies were selected for final review, as Figure 1. 


Stage I Stage 2 
| Scopus Google Scholar 
| e I 
Instia! search Ininal search 
1375 studecs 1623 studies 
_——— a 
After exclusion criteria After exctusion criteria 
508 studies 402 studies 
m Y 
After duplication After duplication 
421 studies 151 studies 
ES ea 
After abstract review After abstract review 
293 studs 71 studics 
UN ee 


After full-text review 
22 studies 


After full-text review 
139 studies 


161 studics 
for final review 


Fig 1. Systematic approach for literature 


2.3 Validity and quality of literature 


To assess validity and quality, the papers have been analysed and identified against peer-reviewed literature and 
grey literature, as it is recognised that the inclusion of grey literature in systematic reviews provides rigour and 
balance of recognised sources of information (McAuley et al., 2000; Blackhall, 2007). Whilst grey literature means 
many things to many people (Mahood et al., 2014), this review identifies grey literature as being book chapters, 
conference proceedings and trade publications. According to McAuley et al., (2000), the review process for a 
meta-analysis should strive to locate and incorporate various reports, including both published and grey literature, 
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that satisfy pre-established criteria for inclusion. In our systematic literature review, we comprehensively searched 
literature and identified 161 papers for final review. The review process assessed the literature's validity and 
quality, including both peer-reviewed and grey literature sources such as book chapters, conference proceedings, 
and trade publications as identified in Table 1. We found that 84% of the selected literature was peer-reviewed 
journals, while the remaining 16% comprised other sources. 


Table 1. Sources of literature 


Source Frequency % of total 
Book Chapters 9 6% 
Conference proceedings 10 6% 

Peer reviewed journals 136 84% 
Trade publications 6 4% 
Totals 161 100% 


3. RESULTS AND DISCUSSION 


3.1 Methodological Approaches to Cost Modelling 


To fully understand the methodological approaches utilised in research, provide data on their use by researchers in 
previous studies; this identifies the approach taken in each study for synthesising the data that may be useful for 
future studies. Analysing the abstracts identified methodological approaches in the selected literature from the 
scoping strategy; this meta-narrative has shown that the prediction method has the highest count across all sectors. 
The prediction method has been used significantly in modelling data centre costs. The other vital approaches 
include machine learning, heuristic, stochastic method, parametric modelling, AHP, Regression Analysis and 
Monte Carlo simulation. It is worth noting that some papers identified Machine Learning, and some artificial 
neural network techniques, whilst others used neural network techniques. Due to the similarity of the techniques 
and neural networks forming a subset of machine learning, we have grouped these in the Machine Learning 
category. Likewise, several papers identified similar techniques whilst others identified heuristic techniques; again, 
we have grouped these in the heuristic category due to the similarity of these techniques. 


When analysing what methodological approaches are specific to the data centre sector by eliminating other 
construction sectors resulting from the scoping search, the results identified 59 different approaches related to data 
centres. These results demonstrate that prediction methodology holds the highest vote count. This methodological 
approach aligns with the vote count trend for the prediction method. It is acknowledged that prediction theory is 
not an absolute exact science and ‘can be compared to weather forecasting, stock market predictions or ‘betting 
on how fast a 100-meter foot race will be run’ (Line, 2008). Prediction theory also requires a substantial quantity 
of data to enable prediction. Advanced modelling techniques are extensively used in cost modelling to improve 
accuracy. One of the most recent advancements in Machine Learning-based approaches. According to a recent 
systematic review (Hashemi et al., 2020), ANN and Regression Analysis were identified as the most widely used 
ML-based cost modelling techniques, followed by hybrid models such as ANN with fuzzy logic, CBR and GA 
(Genetic Algorithm). Machine Learning involves developing a machine-based system that can learn from data. A 
large volume of historical data is paramount for a machine-learning model. 


As data centres are relatively new, developing a machine learning-based model is not feasible at this early stage 
when historical cost data is limited. Fazil et al., (2021) demonstrate that obtaining a reasonably accurate neural 
network prediction is possible even when insufficient information is available during the initial design. However, 
Gunaydin and Dogan (2004) argue that the accuracy that a cost estimation neural network model strongly relies 
on the quality and quantity of data samples used. They claim that more data samples lead to less prediction error. 
Therefore, to create an accurate cost prediction model for building projects, it is necessary to have reliable and 
high-quality cost data for various types and conditions of buildings. Case-based reasoning is another potential 
method for cost prediction, which involves retrieving information from historical data on similar or identical cases. 
However, there are challenges associated with the retrieval process, such as computing similarity measures. 
According to Rashid's research (2017), case-based reasoning is an effective method for predicting costs as it 
involves analysing past cases' attributes, thereby enhancing cost prediction accuracy. However, these models 
mainly rely on historical cost data. In the UK, the Building Cost Information Service (RICS, 2018) offers 
information on construction projects and their corresponding tender prices, and cost managers use this data to 
estimate the cost of a building based on the cost of a similar project with adjustments to reflect any differences. 
However, it does not enable generalisations about the relationships between cost and significant predictors. Lowe 
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et al. (2016) conducted a research study, creating a dependable regression cost model that can be used to estimate 
the construction expenses associated with a building's final account. They highlight that, aside from its practical 
usefulness, creating such a model serves two other purposes. First, it provides a benchmark for evaluating the 
effectiveness of neural network models, and second, it helps identify the variables that display a significant linear 
correlation with cost. However, the effectiveness of these prediction methods has its limitations. 


Regression techniques require a substantial quantity of statistical information, and their precision is affected by 
the supposition that the independent variables are both independent of each other and normally distributed (Son et 
al., 2012). In contrast, according to Zhang (2003), neural networks possess a crucial benefit over regression models 
because they can model nonlinear connections without relying on assumptions. Regression methods demand a 
significant amount of statistical data, and their accuracy is influenced by the assumption that the independent 
variables are independent and normally distributed (Son et al., 2012). In contrast, the primary advantage of neural 
networks over regression models is their capacity to model nonlinear relationships without relying on any 
assumptions (Zhang, 2003). However, building a neural network model also requires data, and designing an 
optimal network structure involves a costly trial-and-error process. Therefore, according to Son et al. (2012), there 
is a notable need for prediction techniques that are more robust and reliable. Likewise, acquiring input data for 
preparing estimates can be challenging. According to Hashemi et al., (2020), in cases where the extent of the work 
could be better understood, it could result in inaccurate and approximate cost estimates. Whilst it is acknowledged 
that few studies focus specifically on selecting suitable sites for data centres (Kheybari et al., 2020), the search 
identified one Delphi study for data centre projects in China as a method for selecting data centres for several 
cities. However, the main findings identified proximity and geographical locations as having the only impact (Yang 
& Ye, 2011). According to King et al. (2023), in the absence of data for assessing the impact of location variables 
for hyperscale data centres, a consensus will need to be obtained from industry experts to obtain the data. 


Whilst Delphi has the lowest vote count, as an approach to forming a consensus, Delphi is an appropriate route. 
This literature review has identified that utilising voting as the ameliorated nominal group technique could be an 
alternative use of Delphi. According to Brauers (2018), the nominal group technique may help generate ideas about 
objectives that could be included in an initial version of the Delphi method. This could facilitate convergence 
towards a final list of objectives. Whilst other top-voted methodological approaches require a substantial amount 
of data to establish and make predictions for capital expenditure, a Delphi study is well suited to establish 
consensus to identify the impact of location variables in the case of Data centres where available published data is 
limited. Some scholars argue that the Delphi method lacks a well-established framework (Crisp et al., 1997; 
Sharkey & Sharples, 2001; Broomfield & Humphris, 2001; Turoff & Linstone, 2002; Campbell et al., 2004; Hsu 
& Sandford, 2007). 


However, Delphi could be used only to identify location-related variables impacting the capital costs of data 
centres. In addition, the Delphi technique can also be integrated with Bayes theory to update established opinions 
through the probability of arriving at different outcomes, as expert opinions are collected through a structured 
sample collection technique to estimate these probable outcomes. Bayesian statistics is based on the theory 
produced by Thomas Bayes (1763); it is characterised by a joint treatment of all quantities of interest in a statistical 
model as random variables. In particular, Bayesian statistics naturally incorporate the uncertainty analysis 
surrounding the estimates or forecasts described in terms of probability distributions, As Figure 2. 


P(A) - P(B\A) 


P(AIB) = B 


Figure 2. Bayes theory 


e P(B) denotes the prior belief (for example, the probability of occurrence of the variable, such as the 
probability of encountering ground conditions) 


e P(B|A) denotes the level of impact should that variable occur 


e P(B) denotes the new evidence 


The information obtained by the Delphi study can be fed into the Bayes formula to render current outcomes based 
on the updated information as provided by a qualitative assessment of the perceived impact of location variables. 
The combination of Bayes theory and the Delphi method enhances the accuracy and decisiveness of the 
mathematical model when compared directly with Prediction Theory. It is worth noting that whilst most literature 
identifies the Delphi method as a tool for knowledge elicitation, it is in the author's opinion that Delphi is a 
methodological approach in its own right due to its systematic nature, potential for quantitative analysis, iterative 
feedback process, incorporation of expert judgment, and consideration of uncertainty make it comparable to other 
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methodological approaches, this is also supported by the seminal work of Hasson et al., (2000). For example, while 
primarily used for knowledge elicitation from experts, the Delphi method is a systematic and structured approach 
to gathering and aggregating opinions and judgments. It involves multiple iterations of anonymous surveys or 
questionnaires to collect insights from a panel of experts. While other methods might use probabilistic models, 
Statistical analysis, or simulation techniques to quantify uncertainty, the Delphi method focuses on expert 
consensus and convergence to address uncertainty. These different approaches to uncertainty management can be 
compared and evaluated based on their effectiveness and suitability for a particular cost modelling context. 


To assess the validity of the findings, we analysed book chapters, conference proceedings, peer-reviewed journals 
and trade publications against the data centre sector and the relationship between the various methodological 
approaches. This indicates that 77% of the findings were from peer-reviewed journals, with 23% being from grey 
literature. As a further analysis, we reviewed the country of research to establish if there were any other research 
gaps in specific regions or countries; this highlighted that there needs to be an identified approach in the UK. 
Whilst the list of methodological approaches identified is informative, it is essential to highlight our study's 
significant contributions and novel aspects compared to previous research in the broader field of cost modelling. 
Unlike previous studies, our research specifically focuses on the context of data centres, a relatively new domain 
within the construction sector. Data centres present unique challenges due to their dependency on location factors. 
Therefore, our study investigates the impact of site location on capital expenditure, addressing a crucial knowledge 
gap in the literature and aligning itself accordingly with constructing for the future. By exploring this specific 
context, we provide valuable insights that can assist decision-makers in making informed choices, mitigating 
financial risks, and enhancing the accuracy of construction cost estimates for data centres. 


3.2 Location Specific Factors 


We have examined whether there is a relationship between location-specific factors and location-specific factors 
influencing cost models, or do cost models influence location choices? This relationship is a crucial matter of 
concern in the decision-making process, as it involves understanding whether location-specific factors influence 
cost models or if cost models influence location choices. There are two key influences, 1) The influence of 
location-specific factors on cost models and 2) the influence of cost models on location choices. For example, high 
land prices in certain areas may increase site acquisition costs, affecting the overall project budget. Similarly, 
regions with high labour costs may result in higher construction expenses. Additionally, proximity to reliable 
power sources or fibre optic networks can impact energy costs and connectivity expenses. 


Understanding the influence of these location-specific factors on cost models is crucial for accurate budget 
estimation and financial planning during the decision-making process. By incorporating this knowledge into the 
cost models, stakeholders can make informed choices regarding the site location, considering the potential impact 
on capital expenditure. Secondly, cost models can also influence location choices for data centre projects. These 
cost models allow stakeholders to evaluate potential site locations' financial viability and profitability based on 
projected construction costs, operational expenses, and expected returns on investment. Cost models typically 
consider political influences, land and construction costs, energy expenses, maintenance and operational costs, 
taxes, and potential revenue streams (Baloi & Price, 2003). By analysing cost models, stakeholders can compare 
different location options and assess the financial implications associated with each choice. This analysis enables 
them to prioritise locations that align with their budgetary constraints and desired profitability targets. They can 
provide insights into the cost-effectiveness of various site locations and guide decision-makers in selecting the 
most favourable option. The relationship between location-specific factors and cost models in data centre 
construction is bidirectional. Location-specific factors influence cost models by directly impacting various cost 
components. Simultaneously, cost models play a crucial role in guiding location choices by providing financial 
insights and evaluating the viability and profitability of potential sites. 


In addition, we have compared the data centre sector to other sectors, demonstrating that other sectors also consider 
location and location-specific factors when cost modelling. For instance, in the retail industry, location plays a 
crucial role in determining the viability and profitability of a store, as researchers have found that factors such as 
population density, income levels, competition, and proximity to transportation hubs significantly influence the 
cost modelling approach for retail establishments (Kerin & Harvey, 1975; Brown, 1993). Similarly, in the real 
estate sector, location-specific factors are vital for estimating property values and rental rates, with research 
suggesting that variables such as neighbourhood quality, accessibility to amenities, proximity to schools, and crime 
rates directly affect residential and commercial properties (Klimczak, 2010). Furthermore, in the transportation 
sector, location-related factors impact cost modelling approaches, such as when estimating the costs of 
constructing highways or rail networks, factors such as topography, soil conditions, presence of natural obstacles, 


384 


and proximity to existing infrastructure play a significant role (Daniels & Mulley, 2012). These examples 
demonstrate that various sectors, including retail, real estate, and transportation, recognise the influence of location 
and location-specific factors when cost modelling. 


4. CONCLUSIONS 


By analysing the methodological approaches through the systematic review, we have established trends in the 
literature and identified what methods are being utilised together. For example, we have identified the Delphi 
method as a structured and iterative approach that involves collecting and synthesising expert opinions to make 
informed decisions. In investigating the impact of site location on capital expenditure, the Delphi method can help 
gather insights from a panel of experts regarding the relationship between location factors and construction costs. 
By utilising the Delphi method, we can tap into the collective wisdom of experts in the field and gain insights into 
the impact of site location on capital expenditure. The Delphi method helps to mitigate biases and provides a more 
comprehensive understanding of the relationships between location factors and construction costs. Likewise, 
Bayes's theory is a statistical approach that allows for incorporating prior knowledge and updating probabilities 
based on new evidence. It provides a framework to quantify uncertainty and make probabilistic inferences. 
Applying Bayesian theory to investigate the impact of site location on capital expenditure involves formulating 
and updating probability distributions based on available data and expert opinions. By applying Bayesian theory, 
we can incorporate prior knowledge and new evidence to quantify the impact of site location on capital 
expenditure. This approach allows for a more nuanced and probabilistic assessment, considering the inherent 
uncertainties in the relationship between location factors and construction costs. The Delphi method and Bayesian 
theory provide valuable tools to investigate the impact of site location on capital expenditure for hyperscale data 
centres. 


The Delphi method leverages expert opinions and consensus-building, while Bayesian theory incorporates 
statistical analysis and the integration of prior knowledge and data. Combining these approaches can provide a 
comprehensive understanding of the relationship between site location and construction costs in data centre 
projects. To conclude, it has been identified through this meta-narrative analysis that the synthesis of both Delphi 
Methodology and Bayes Theory is a robust methodological approach to identifying the location-related factors for 
hyperscale Data centres where variables are not fully known. The development and growth of data centres and the 
result of this research are essential to how we manage the construction industry's digital transformation. 
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ABSTRACT: Off-site construction (OSC), including prefabrication and Modular-integrated Construction (MiC), 
is gaining popularity as a school of sustainable construction methods that can improve productivity, quality, and 
waste reduction. However, OSC projects face challenges related to technical integration, collaboration among 
stakeholders, and dynamic uncertainties. As a result, high-quality standards throughout the manufacturing process 
of OSC products are difficult to ensure, leading to costly rework, delays, and safety issues. This paper applies a 
value stream mapping (VSM) approach based on lean principles to OSC production for identifying lean 
management opportunities for off-site construction production. The case studied in this paper is reinforced 
concrete slabs. First, we employed a combination of field investigations and interviews to formalize the flow of 
materials and information for the case. Then, VSM processed the flow for the current state map, which highlights 
twelve opportunities to prioritize for slab production, e.g., the adoption of digital technology (VR, BIM) in 
information flows. The findings in value-added activities improvement of the opportunities demonstrate the 
potential of lean management in the slab production case. Furthermore, the VSM approach in this paper can 
identify the ‘wastes’ in lean theory, which are the control points of OSC production, for enhancing quality, 
efficiency, and resource utilization. The findings contribute to the existing body of knowledge by providing 
empirical evidence of the VSM approach to the identification of lean management opportunities for OSC 
production. 


KEYWORDS: Off-site construction, Prefabricated products, Lean management, Value stream mapping, Quality 
assurance, Construction industrialization, Lean construction 


1. INTRODUCTION 


Off-site construction (OSC) generally involves standardizing design, manufacturing components in the factories 
and assembling during construction (Hosseini et al., 2018). With the upgrades of industrialization (Mitolo et al., 
2021), examples of OSC are Bricks, 2D slabs, 3D volumetric, and Modular-integrated Construction (MiC). OSC 
is recognized as a promising school of sustainable construction methods that integrates design, production, and 
construction to save energy, eliminate environmental wastes and maximize the value of the whole life cycle of 
construction products (Li et al., 2021). In the literature, OSC is confirmed capable of improving labour productivity 
and product quality, enhancing energy conservation and emission reduction, thus solving prominent problems such 
as strengthened resource constraints and labour shortages. On the other hand, OSC also faces challenges, such as 
complex technical integration, collaboration of multiple stakeholders, and dynamic uncertainties (Zhao et al., 
2023). Thus, OSC requires the coordination and integrated application of a wide range of multidisciplinary and 
inter-organizational skills. 


OSC production refers to manufacturing precast construction elements in the factory, which is the most important 
stage for the quality assurance of products, as the construction site could be responsible for limited rework and 
repairs (Wu et al., 2019). Unlike traditional manufacturing factories, OSC production is a complex system 
involving different stakeholders and various information interactions (Khalili & Chua, 2014). Therefore, the 
complexity also leads to quality risks frequently occurring throughout the entire OSC products production 
processes, including production planning, engineering design, mould production, components manufacturing, 
quality inspection, storage, and loading for logistics (Lee et al., 2016). Furthermore, the connections between the 
stages may amplify the risks from one stage to another, e.g., deformation occurs in the moulds at the beginning of 
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the production will lead to defective components, but components may meet inspection requirements in the 
following product stage, and such mistakes can not be identified until the inspection of the finished products 
(Heravi & Firoozi, 2016). Therefore, systematic quality management is vital to OSC production. 


Current approaches adopted in OSC production management are limited (Lu et al., 2018). For example, in the 
inspection process, the inspectors conducted manual spot inspection and made subjective decisions stemming from 
their prior knowledge in identifying products’ flaws early in the production process, which inevitably suffers from 
labour-intensive consumption and subjectivity of opinions rather than precisely matching the substantial conditions 
(Xue et al., 2018). Consequently, defects are not perceived until the later stage of logistics or even the on-site 
installation stage (Park et al., 2013). Defects that arise during the production stage are costly for owners, 
contractors, and prospective clients because they are nearly impossible to chain significantly during the assembly 
stage. 


Lean management concepts have the potential to increase quality, efficiency, and resource utilization for OSC 
production (Sacks et al., 2010). In lean principles, value is created by a sequence of value-adding activities. A 
value stream is the sequence of activities by which a company provides a product or service that delivers a specific 
product or service to a specific customer (Koskela, 1993). Value Stream Mapping (VSM) is an approach for 
illustrating the flow of materials and information in a lean manufacturing system (Ko & Kuo, 2015). It utilizes the 
tools and techniques of lean manufacturing to help organizations identify where waste is and, in turn, streamline 
the flow of production. The purpose of value stream mapping is to identify and reduce waste in the production 
process (Rahani & Al-Ashraf, 2012). Waste in this context is defined as any activity that does not provide added 
value to the end product and is often used to illustrate the total amount of waste reduced in the production process. 
Managers, engineers, process planners, suppliers, workers in the manufacturing industry, and customers can all 
benefit from value stream mapping by identifying waste and determining where to start looking for its main causes. 
In this way, value process mapping can also be a communication tool in linking with stakeholders of the factories 
(Chen et al., 2008). In general, the adoption of VSM based on lean concepts can enable the effective utilization of 
resources, e.g., in the automotive industry, supply chain analysis, and manufacturing industries (Rahani & Al- 
Ashraf, 2012). 


This paper aims to apply the VSM approach based on the identification of lean management opportunities in OSC 
production. The objectives of the paper are: 1) To map the OSC production process; 2) To apply VSM to the map 
for locating the ‘wastes’ in the existing process; 3) To identify corresponding management opportunities for OSC 
production. 


2. LEAN PRINCIPLES AND VALUE STREAM MAPPING 


Lean manufacturing principles have been used for a long time in the manufacturing sector to boost output, 
productivity, remove waste, and deliver value (Sacks et al., 2020). The term "lean principle" describes the ongoing 
process of raising the value of a product by removing waste without sacrificing output, productivity, or quality 
(Koskela, 1993). Reducing non-value-added activities and achieving value-added delivery in the construction 
domains are the major areas of concentration for OSC. To address resource difficulties, lean principles' practices 
pledge to offer solutions that are lasting. Once the lean techniques were applied to the project scenarios, it became 
clear that significant amounts of materials were saved. 


VSM is the whole of the actions—both value-added and non-value-added—currently required to move a product 
through the primary flows that are fundamental to every product. It is the production flow, from the raw material 
to the end customer, as well as the design flow, from conception to realization (Grewal, 2008). Firstly, it is a tool 
to identify waste and problems. It reviews operations and processes from a macro perspective, from the input- 
output process, and allows managers to easily identify sources of waste (excess inventory, heavy work, time 
wastage, handling, inspection, etc.), thus providing a scientific basis for continuous, systematic improvement. 
Secondly, it provides a common language. Value streams can be used as a common language for process and 
process improvement, making it easy to communicate between different departments. It is a method of determining 
and differentiating priorities for improvement, avoiding "picking the easy ones" for improvement and maximizing 
the return on investment. The VSM is the basis for the preparation of improvement plans and their implementation. 


The integration of lean principles and OSC has been beneficial in optimizing its production processes and has 
recognized great potential in effectively tackling resource waste and unsatisfactory quality. As a result, numerous 
researchers have used VSM in conjunction with lean concepts to increase productivity and efficiency in the OSC 
manufacturing process as well as other areas. According to Wu and Pheng (2011), value stream mapping of precast 
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manufacturing is essential to achieving sustainability goals. In the case of Yu et al. (2009), by applying the value 
streaming on a consistent work/product flow rather than on individual tasks, the delay brought on by a conflict 
between the predetermined schedule and the actual schedule of the complicated process of building a house can 
be decreased. To identify waste during the construction of concrete slabs in residential constructions, Fontanini et 
al. (2013) employed value spectrum mapping with lean concepts, increasing efficiency and effectiveness. However, 
lean principles have their roots in the manufacturing sector, where they have a mature theoretical and research base 
(Grewal, 2008). There is a lack of an organized framework that can be used to promote the implementation of lean 
manufacturing principles in the production of OSC products. 


In summary, VSM indicates the key operating procedures that adds value to the targeted product. Instead of 
depending solely on the currently established conventional working hours, the use of VSM is combined with field 
study, information gathering, and interviews, capturing the flow of material and information. Cycle times, line 
change times, operator counts, size of batches, and quantities of semi-finished products in a process are a few of 
the forms of data that are capable of being exploited to analyse the lead time and added value time of the entire 
process. So, in addition to the VSM approach, both field research and interviews are employed in this paper. 


3. RESEARCH METHODS 
3.1 Case selection 


The case selected in this paper is the reinforced concrete (RC) slab in the prefabricated construction. The selection 
was based on the product family matrix defined in the VSM approach (Fontanini et al., 2013). Usually, RC slab 
production accounts for >80% in both prefabricated construction and RC MiC (Loss et al., 2016), which fulfil the 
requirements of the product family matrix. The case factory is in Shenzhen, which has major products including 
various types of precast elements such as facades, partition walls, floor slabs, staircases, columns, integrated 
kitchens and washrooms. The involved projects include residential, commercial buildings, roads, and tunnels. 


3.2 Production process mapping 


The scope of activities includes added-value activities at each level, from the raw materials to the finished products, 
which includes conceptual design, product design, and process designs (Sacks et al., 2020). In OSC production, it 
is the in-factory value stream, i.e., the manufacturing stream from design to ready-to-transport stage. 


In the site investigation, observations were made regarding the plant layout, the machinery used, the sequence of 
activities and the time taken for each activity by visiting the site and also interacting with the staff working there. 
The typical activities are (A) mould assembly and preparation, (B) fixing of rebars, (C) concreting and curing, (D) 
de-moulding and (E) transferring for storage. 


Mould assembly and preparation. Casting begins with the assembling of the mould. The front-work used to pour 
and cast concrete is called a mould. Using overhead cranes, the mould plates are removed from the inventory area 
and set down on the fixed table. Using bolt and weld connections, mould assembly and preparation entails 
assembling the mould plates. Based on the shop drawings that the design team produced, the assembly is carried 
out. The timeline of the activities taking place at the mould assembly is shown in Fig.1. 


Fixing of rebars. The rebars are then fixed into the mould as the following step. Cutting, bending, and forming 
the reinforced cage in accordance with shop drawings are all steps in the fabrication of the rebar cage. A separate 
reinforcing cage is created, and then it is added to the mould. The Fig. 2 is the order of activities that take place as 
the rebar cage is transferred to the mould. 


Concreting and curing. Concreting is the following step. Before the concreting process begins, the raw materials 
are fed into the mixer. Both the mixing and transporting of concrete are automated and simple to use. The system 
of the flying bucket is employed. The Fig. 3 is a listing of activities that take place during concreting and curing. 


Demoulding and transferring to storage yard. Demoulding is followed by moving the precast concrete 
component with a crane to the storage location. At the storage location, curing is carried out using a sprinkler 
system that operates for three days. The storage location keeps the finished components for further transportation. 
Before delivering the components to the construction site for assembly, it is important to (1) inspect the physical 
state of the final product, (ii) verify the important parameters, (iii )apply the proper identification markings to the 
elements that indicate position, individual category, weight, size, and placement in accordance with the shop 
drawing, and (iv) check whether the components have reached 75% of their design strength in concrete. 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


Fig. 1: Mould assembly and preparation of a prefabricated RC slab. 


Fixing the loop box with 
the reinforcement cage 
which is used while lifting 


«Checking whether the «Correct positioning and 
rebar size, spacing and lap properly securing of rebars 
electrical conduits, lifting 

the components using 
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crane 


Fig. 2: Fixing of rebars of a prefabricated RC slab. 
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Fig. 3: Concreting and curing of a prefabricated RC slab. 


3.3 Non-value-added activities located using VSM 


This step aims to apply VSM to locate the non-value-added activities. Non-value-added activities, referred to as 
waste, are activities that do not directly contribute to the product's value or the customer's perception of value. 
These activities consume resources (time, materials, labour) but do not enhance the product or service in any 
meaningful way. As a result, by identifying the non-value-added activities and location of waste, the potential 
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improvement could be a promising action. The categories and identification of non-value-added activities are listed 
below (table 1). 


Table 1: Categories and identification of non-value-added activities. 


Categories of non-value-added 


RAS Identification 
activities 
; Unnecessary movement or transportation of materials or products between different process steps, 
Transportation i h . 
leading to delays, damage, and increased lead times. 
; Excess inventory or work in progress that is waiting to be processed, increasing capital and space 
nvento 7 FERDO s 
R without providing immediate value. 
Waiti Idle time for products, materials, or employees waiting for the next step in the process, reducing 
altın; 2 
5 efficiency 
Over-production Producing more than what is currently needed, which leads to excess inventory and waste. 


; Performing more work than necessary, such as using high-precision processes for tasks that don't 
Over-processing 


require it. 


Defects Activities required to fix mistakes, rework, or correct defects, increasing costs and schedule delays. 


3.4 Identification of corresponding management opportunities 


The goal of this step is to adopt VSM analysis to identify opportunities for improvement. The identifying process 
is based on two basic processes: the information flow and the material flow. The information flow is the process 
that starts when the marketing department receives an order or has already forecasted the customer's needs and 
makes it into a purchasing plan and a production plan. The material flow refers to the physical process, i.e., the 
process that starts with the supply of raw materials into the warehouse from the supplier, followed by the outbound 
manufacturing, the finished products into the warehouse, until the product reaches the customer. In addition, the 
material flow includes the inspection and storage of products. Furthermore, there are eight waste principles on 
which judgement is based, including waste of repair, waste of over-processing, waste of movement, waste of 
handling, waste of inventory, waste of making too many/too early actions, waste of waiting and waste of 
management. Therefore, in the analysis of this paper, any part of a product that exceeds the minimum amount of 
resources necessary to add value to the product is considered waste — a waste is not only an activity by definition 
that does not add value but also a process that overuses resources. 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


4. RESULTS 


Fig. 4 shows the results of process mapping, the adoption of the VSM analysis for locating wastes in the current 

process map, and the current state map with information flow, material flow and lead time ladder. The information 

flow depicts the communication between the management team, the production team, the supplier, and the client. 

The material flow displays the movement of materials through different steps from supplier to finished product. 

Cycle time denotes the duration of each step, whereas lead time indicates the length of time it takes to produce 

precast components from an order to their completion. While cycle time solely includes time for value-added 
activities, lead time contains time for both value-added and non-value-added activities. The sequence of activities 

is named from A to E. Evidences of the current waste locating. 1) Waiting. A curing period of 12 hours is required 
when employing the water pond method, thereby extending the waiting duration for subsequent procedures. 2) 

Defects. Damage to mold plates occurred when the welded connection was disrupted during the demolding process. 
3) Overprocessing. Remediation of damaged finished components was carried out through epoxy injection, thereby 
elongating the lead time. 4) Motion. The Reinforcement cage production area is situated 200 meters away from 
the molding table, resulting in an increased travel time for the crane during the transfer to the table platform. 5) 
Transport. The storage area for mold assembly parts is situated at a distance of 400 meters, leading to an extended 
travel time for the EOT crane during the transfer to the table platform. 


Information Flows 


Material Flows 


Lead Time Ladder 


Fig. 4: VSM of the RC slab production. (Note: Process A- Mould assembly and Preparation, Process B- Fixing 
of Rebars, Process C- Concreting and curing, Process D- De-moulding and Process E- Transfer to storage) 
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CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 
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3. Resilient daily scheduling 10. Layout optimization 
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12. Rework, reuse, and recycle 
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7. Clear standard operating 
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8. Accurate information support 
for the decision-making 


Fig. 5: 12 opportunities for potential improvement 


Twelve opportunities are highlighted, as shown in Fig. 5, in information flows and material flows based on the 
VSM of the RC slab production. In the information flows, eight opportunities are concluded. Opportunity 1: real- 
time progress tracking, such as the adoption of Virtual Reality (VR) (Rahimian & Ibrahim, 2011) and Building 
Information Modeling (BIM) (Xue et al., 2021), enables the off-site production team to monitor every step of the 
production process as it happens. This visibility facilitates quick identification of bottlenecks, inefficiencies, or 
delays. With the assistance of a real-time view of progress, the stakeholders can take proactive measures to 
optimize workflow, allocate resources more effectively, and prevent unnecessary downtime. Opportunity 2: 
implementing a standardized feedback loop creates a systematic process for collecting insights and suggestions 
from various stakeholders involved in the factory. The information loop contributes to continuous improvement, 
as the feedback is used to identify areas for refinement and innovation. By incorporating valuable input from 
workers, the production process becomes more streamlined, and waste is reduced through collaborative problem- 
solving and optimized processes. Opportunity 3: a resilient daily scheduling approach acknowledges the dynamic 
nature of off-site construction and prepares for potential disruptions. By building flexibility into the schedule, the 
stakeholders can better adapt to unexpected changes without causing significant interruptions. Opportunity 4: 
timely quality assurance data uploads play a crucial role in maintaining high-quality production standards. By 
promptly uploading quality assurance data, any deviations or defects are identified early in the process, allowing 
for timely corrective actions. This reduces rework, minimizes waste associated with defects, and ensures that the 
final product meets or exceeds quality expectations. Opportunity 5: adopting a just-in-time task delivery approach 
optimizes the timing of task completion to match actual production needs, preventing the accumulation of excess 
inventory or work in progress. Opportunity 6: emphasizing the reduction of learning curves through proper 
training and skill development enhances the expertise and efficiency of the production workforce. Skilled workers 
are less likely to make errors or require additional time to complete tasks. Opportunity 7: establishing clear and 
well-documented standard operating procedures (SOPs) provides a structured framework for the entire off-site 
production team to follow. These SOPs guide consistent and standardized practices, reducing variations that can 
lead to errors or inefficiencies. Clarity in procedures minimizes waste associated with defects, misunderstandings, 
and unnecessary deviations from the optimal workflow. Opportunity 8: access to accurate and timely information 
forms the foundation of effective decision-making in off-site production execution. The reliable data leads to 
quicker and more confident decisions, aligning with lean principles by reducing delays and uncertainties that can 
lead to waste. 


Four opportunities are summarized in the material flows. Opportunity 9: effective coordination between the 
production scheduler, materials supply, and inventory management is vital for lean management in off-site 
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construction. By synchronizing production schedules with the availability of materials and maintaining an 
optimized inventory level, the production process could embrace increased efficiency. Opportunity 10: 
optimizing the layout of the factory is a strategic improvement opportunity that can significantly enhance lean 
management. By arranging workstations, storage areas, and production lines logically and efficiently, material 
flow is streamlined. This optimization reduces unnecessary movement, transportation, and waiting times, leading 
to improved overall efficiency and resource utilization. A well-organized layout also supports just-in-time delivery 
and minimizes the chances of defects or errors. Opportunity 11: increasing the visibility of waiting times is crucial 
for waste reduction and process improvement. By close monitoring and making waiting times transparent to all 
stakeholders, bottlenecks and delays are quickly identified. This visibility prompts immediate action to address 
issues, prevent idle time, and maintain a consistent production flow. Opportunity 12: the practice of reworking, 
reusing, and recycling defective items is a sustainable and lean-approach. Efforts are made to salvage and 
repurpose them where possible, leading production management to the more resource-efficient and 
environmentally responsible. 


5. DISCUSSION AND CONCLUSION 


The OSC production is that the dynamic nature of the production requires a rapid response. However, OSC 
production, unlike a generic manufacturing facility, has long order lead times, relies on several tiers of 
subcontractors and casual labours, and requires close coordination. This paper maps OSC production processes in 
the factory and applies the VSM approach from lean manufacturing to lean OSC management opportunities. The 
findings from the RC slab case demonstrate the identification of the existing wastes, non-value-adding activities, 
and potential lean management opportunities. Examples of the opportunities include eight opportunities in 
information flows and 4 opportunities in material flows. 


The contribution of this paper mainly lies in advancing lean management practices in the field of off-site 
construction. By offering practical, and customized solutions, it is promising to drive positive changes, optimize 
production processes, and foster continuous improvement within the OSC production. 


There are several limitations in this paper as well. First, this paper involves one typical case of RC slab without 
detailed analysis on the generalizability to other products. Further, VSM is a dynamic tool, and the process should 
be revisited periodically to ensure ongoing improvements. Future research could focus on how to apply a more 
scientific and rational approach to further improve the quality and management efficiency of OSC products in the 
identified management phases. 
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ABSTRACT: The theme of ’The Impact of Engineering Practices on a Sustainable Built Environment’ emphasises 
the importance of considering various dimensions of resilient infrastructure. Selecting the location for a 
Hyperscale Data Centre is a crucial process that involves assessing the impact of various location variables. To 
determine the viability of a location, it is essential to identify the potential risks associated with each variable. 
This paper presents a proprietary methodological approach that includes a Delphi study to identify risks, a Likert 
scoring system to assess prior probabilities, and a Bayesian theory-based decision tree to assess the impact 
through risk prediction. The paper's contributions are significant, and the proposed methodology makes it possible 
to predict the risk level of each location variable by identifying the appropriate contingency percentage. The 
study's findings indicate that the paper's proposed approach is an effective way to mitigate the risks associated 
with selecting a location for a Hyperscale Data Centre. Embracing this knowledge allows us to align research 
and practise with the conference’s call to studying the resilience of buildings and infrastructure to natural 
disasters and climate change, and developing strategies for adaptation and mitigation, ensuring that these 
practises become integral to shaping the future of Data Centres. 


KEYWORDS: Bayes Theorem, Delphi, Data Centre, Location Variables 


1 BACKGROUND 


Investments in Data Centres in the Nordic region have been on the rise, with significant contributions from cloud 
and hyperscale investors such as Facebook, Google, AWS and Apple, due to advanced technological progress and 
favourable cold climate conditions, significantly reduce the cooling energy demands of the facilities (Christensen 
et al.,2018; Avgerinou et al., 2017). However, the location of Data Centres outside of the UK presents a significant 
challenge for cost consultants during the capital cost estimation and modelling stages, which can impact 
investment decisions. At the feasibility stage, cost planning involves determining the possible cost of a building 
early in the design stage in relation to the employer's fundamental requirements before preparing a complete set 
of working drawings or quantities bills (RICS, 2011) Historical cost data is often used as base cases for cost 
consulting professionals, who adjust their costs to suit the circumstances of new projects. Although specific 
characteristics such as shape, inflation, and specifications are relatively easy to adjust based on case-based 
reasoning, predicting the impact of location is challenging for construction professionals, who rely on location 
cost indices for this purpose. Various location cost indices, such as Spon's Architects and Builders Price Book 
(AECOM, 2017) and the Building Cost Information Service (RICS, 2018) are available for cost consultants. 
However, such indices are less relevant for Data Centres as there are often no precedents set to use as a baseline 
for cost comparisons, and there are many variables ranging from macroeconomic, construction methodology, 
geographical, and geological categories. For example, regulations for noise attenuation for hyper-size generators 
for Data Centres did not exist in Sweden and had to be modelled on regulations from other countries (Vonderau, 
2017), International location cost indices, such as those provided by Eurostat (EC, 2019), World Bank (2022) 
and the OECD (2022) are broad and mainly model variations at the country level, making them less effective 
during cost planning for individual projects specific to a particular region. Therefore, construction professionals 
must consider multiple factors and rely on a combination of indices and expert judgment to provide accurate cost 
estimates for Data Centres. 


2 RESEARCH AIM 


Whilst a wide range of variables impacts construction project costs and cost modelling, there is no evidence to 
suggest whether and how these variables would affect site location in cost planning for the capital expenditure of 
Hyperscale Data Centres. Although there is published data on traditional construction costs and location indices 
in the UK, they do not provide enough information to assess the impact of location variables, especially 
considering the specific design requirements of Data Centres (King et al., 2023). This highlights a significant 
knowledge gap in the existing body of research. This paper aims to validate a methodological concept using Delphi 
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and Bayesian theory to assess the probability and impact of location variables. This approach aims to aid in 
selecting the appropriate methodological approaches for the research topic of ”the impact of location variables 
on the modelling and forecasting of Hyperscale Data Centres". By utilising this method, the study seeks to identify 
the potential risks and impacts of various location variables, contributing to a more comprehensive understanding 
of the relationship between site location and capital expenditure in the Hyperscale Data Centre industry. 


3 METHODOLOGY 
3.1 Risk 


Risk refers to situations that involve uncertainties that may occur, risk mitigation refers to actions taken to optimise 
the impact of risk. By selecting a comprehensive risk management strategy that considers all types of risk, one 
can ensure the implementation of a planned Data Centre investment within the specified time and budget. Various 
organisations have developed several approaches to risk. Notable among these are the Project Management 
Institute (PMI., 2001) and PRINCE2 (Bentley., 2012). This paper aims to introduce a concept that can quantify 
the impact of risk through a Delphi and Bayesian approach. A risk is defined as the probability of an event 
occurring and the subsequent consequence, as expressed in Equation (1). Here, R represents a risk, P is the 
probability of the event occurring, and C is the impact or consequence of the event. 


R=(P.C). (1) 


Various methodologies exist for identifying risk including identification, assessment, response, and monitoring. 
Risk identification is identifying potential risks that may impact the project. Risk assessment involves analysing 
and evaluating the likelihood of occurrence, impact, and consequences of the identified risks. The risk response 
involves developing a plan to manage or mitigate the identified risks. Lastly, risk monitoring and control. 
Quantifying the impact of risk, especially with location variables, can provide invaluable information to decision 
makers and stakeholders and can be used to make informed decisions, develop contingency plans, and allocate 
resources appropriately. Therefore, developing a method to assess the impact of location variables on project risk 
can significantly improve the success of a project. Risk decisions involve assessing the factors that contribute to 
the emergence of risk and the likelihood and potential impact of the event. 


3.2 Delphi Study 


A pilot Delphi study (King et al., 2023) has been conducted to obtain expert opinions on the key themes that affect 
the location variables of Hyperscale Data Centres and their impact on the modelling and forecasting of capital 
expenditure. The analysis of the pilot study data has provided rigour and validity to the questionnaire for 
the main forthcoming Delphi study. This has allowed for identifying and assessing potential risks associated with 
the location variables of Hyperscale Data Centres. The pilot study results indicate the current understanding of 
the variables that impact the modelling and forecasting of capital expenditure for Hyperscale Data Centres. These 
variables have been identified as potential risks and are an essential consideration in the risk management strategy 
for the planning and implementing Hyperscale Data Centres. Previous research found that pilot Delphi studies are 
rarely reported in academic literature, making it difficult to establish best practices (Clibbens., 2012). For this 
pilot study, industry expert knowledge was obtained through several expert participants (n=5). The response rate 
was 100%. Through an open-ended questionnaire, experts could respond freely and without restriction. Having 
completed the thematic analysis of the data arising from the questionnaire, the pilot study identified categories 
and themes that are considered risk items; the following items were among those rated by the participants as 
having an impact on capital expenditure when locating a data centre: 


e Requirement for cooling towers due to sub-zero climate 

e Requirement to import generators due to in-country shortages. 

e Acoustic screens to generators due to proximity of residential neighbours 

e In-country technical labour shortages require backfilling with imported, experienced technical labour. 


The themes arising from the Delphi study provide the data that will be used to provide the data that will be used 
for the assessment of the impact of location variables within a Bayesian framework. 
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3.3 Bayes Theory 


Bayesian theory is based on the probability theory given by Thomas Bayes in 1763 (Bayes., 1763). Bayes's theory 
relates the conditional probabilities of random variables to each other. It provides a framework that allows for the 
integration of a prior belief about the distribution of a quantity of interest (the prior distribution) and the observed 
data (through the likelihood term). as shown in Equation (2). 


P(A) - P(BIA) 
P(B) (2) 


P(A|B) = 
To clarify, in this instance: 


e P(B) denotes the prior belief (for example, the probability of occurrence of the variable, such as the 
probability of encountering ground conditions) 

e P(BJ|A) denotes the level of impact should that variable occur. 

e P(B) denotes the new site-specific evidence (for example, when new information arises, i.e., a higher 
probability of occurrence of encountering ground conditions) 


Bayes theory can be applied to numerous components by using the product rule (Pearl., 2022) and, therefore, 
Bayes theory is applied for calculating the probability of occurrence of a phenomenon or hypothesis using multiple 
factors or variables. It is also considered a powerful method for hypothesis testing (Wetzels et al., 2012) making 
assumptions and having wide-ranging decision-making applications related to artificial intelligence, machine 
learning, and bio-statistics approaches. Prediction theory is a sub-field of statistics and machine learning that 
involves the development of mathematical models and algorithms for predicting future outcomes or events 
(Sarker., 2021). It uses data from past observations to create models that can be used to forecast future outcomes. 
Prediction theory employs various data analysis techniques like regression, clustering, and classification. It also 
involves identifying essential variables and patterns within the data, calculating the probability of specific 
outcomes, and selecting desirable outcomes based on the model generated. Although prediction theory and Bayes 
theory are related, they differ in terms of their fundamental principles. Bayes’s theory concerns conditional 
probability and allows for the revision of probabilities based on new information or evidence (Ajzen et al., 1975). 
On the other hand, prediction theory is focused on building models and computing algorithms to predict outcomes 
from complex data sets. While prediction theory may incorporate probabilities, it does not involve the revision of 
probabilities like Bayes’s theory. Using Bayesian theory and correlation analysis is a common practice for 
predicting future outcomes and events. In addition, integrating prediction theory with the Delphi method is a 
recognised technique used to forecast future outcomes based on expert opinions (Turoff et al., 2002). The Delphi 
method involves obtaining consensus opinions from subject matter experts through a series of planned interviews 
or surveys, which can then be used to forecast future outcomes. Furthermore, the Delphi method can be combined 
with Bayesian theory to revise established opinions based on the likelihood of different outcomes. This study 
highlights that expert opinions gathered through a structured sampling technique such as the Delphi method can 
be utilised to estimate probable outcomes, which can then be inputted into the Bayesian formula to provide current 
outcomes based on updated information gathered through qualitative risk assessments. The combination of the 
Delphi method and Bayesian theory enhances the accuracy and decisiveness of the mathematical model compared 
to using prediction theory alone. Previous research supported this approach, including Bijak (2011), who identified 
Bayesian theory as a natural methodology for combining expertise and data with expert judgments. Additionally, 
Bernardo (2003) suggests that Bayes’s formula allows for expert opinions to be incorporated into projections in 
the form of prior distributions. However, a limitation of Bayesian forecasts is that they may contain subjective 
elements due to their dependence on expert opinions and history obtained from the data series (Abel et al., 2013). 
In conclusion, the combination of Bayesian theory and the Delphi method can provide a robust methodology to 
model and forecast the impact of location variables on Hyperscale data centres. 


4 DATA COLLECTION 


4.1 Likert 

Psychologist Rensis Likert invented the Likert scale (Likert., 1932). It is a rating scale used to measure attitudes, 
opinions, or perceptions. The scale can have anywhere from 5 to 11 points, with the most common being a 5-point 
scale. It is widely used in social sciences, especially in survey research, as it allows researchers to gather 
information about people's attitudes, opinions, or perceptions systematically and standardised. The scale is also 
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commonly used in market research, customer satisfaction, and employee engagement surveys. The Likert scale 
has several advantages, including ease of use, simplicity, and flexibility. It is easily understood by respondents, 
which can improve the accuracy and reliability of the data collected. However, it is important to remember that 
the Likert scale also has limitations, such as possible response bias, limited ability to capture complex attitudes, 
and the potential for data to be misinterpreted if it is not used appropriately. It is important to carefully consider 
the wording and format of the questions in the Likert scale to minimise these limitations and ensure accurate data 
collection. Additionally, it is essential to use appropriate statistical techniques when analysing the data obtained 
through the Likert scale to avoid misinterpretation of the results. 


4.2 Probability 


To establish the likelihood of events, a Likert ranking has been proposed with two extremes at either end of the 
scale. A score of 1 denotes an event highly unlikely to occur, whereas a score of 5 represents a highly likely 
scenario, as shown in Table 1. For instance, when assessing power availability, one might score it as one because 
the likelihood of that event occurring is low. On the other hand, if there is a substation on-site and the site is 
situated in the centre of a seismic zone, a score of 5 may be assigned since the probability of a seismic event 
causing damage is very high. These scoring descriptions outline the scoring criteria and help prevent ambiguity 
when experts score as part of the Delphi study. 


Table 1: Likert ranking for probability. 
Likert scale Probability 


1 Very unlikely 
2 Unlikely 

3 Neutral 

4 Likely 

5 Very likely 


The variables identified and presented in Table 2 are derived from a previous Delphi study by King et al (2023). 


Table 2: Likert scoring results for the probability of the event occurring. 


Very Very 
Variable unlikely Unlikely Neutral Likely Lik ely 
Cooling towers 4 1 4 41 16 
Imported generators 4 4 32 23 3 
Acoustic screens 4 1 4 39 18 
Technical labour shortage 4 28 27 5 2 


These Likert scoring values are intended to illustrate the proof of concept. They are based on the authors' 
professional judgment regarding the probability of each item occurring in the real world. However, it is essential 
to note that these scores are hypothetical for illustration only to demonstrate the proof of concept. They will be 
subject to revision based on new available information, resulting in updated posterior probabilities that may differ 
significantly from the initial estimates. 


5 RESULTS AND DISCUSSION 
5.1 Establishing Nodes 


The scoring rankings for probability are derived from the Likert scoring results in Table 2 and weighted to generate 
the probability distribution required for the Bayesian analysis. A weighing method has been used to assess these 
conditional probabilities, as shown in Equation (3). 


Occurance 


Total respondants (3) 
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The variables and the conditional probability of these events occurring are shown in Table 3. The results 
subsequently creating the nodes for the Bayesian network. 


Table 3: Conditional probability of the event occurring. 


Very Very 
Item Unlikely Unlikely Neutral Likely Likely 
Cooling towers 6% 2% 6% 62% 24% 
Imported generators 6% 6% 48% 35% 5% 
Acoustic screens 6% 2% 6% 59% 27% 
Technical labour shortage 6% 42% 41% 8% 3% 


Therefore, the node describing the event of Cooling Towers together with possible scenarios of this likelihood 
together with possible assessment factors is as Figure 1 


Cooling towers 


VeryUnlikely 6.00 
Unlikely 2.00 
Neutral 6.00 
Likely 62.0 
VeryLikely 24.0 


Figure 1: Conditional probability node of Cooling Towers being required. 


5.2 Assigning event probabilities 


A process of identifying possible events for each of the variables was established. The basis of the Bayesian 
network is related to determining the relationship of each individual node in the network. For this proof of concept, 
the relationship of individual node was based on the authors’ own experience and assessed using a low, medium, 
and high ranking. For example, the impact of cooling towers is identified in Figure 2. 


Cooling towers 

VeryUnlikely 100 0 0 
Unlikely 50 50 0 
Neutral 0 50 50 
Likely 0 0 100 
VeryLikely 33.3 33.4 33.3 


Figure 2: Conditional probability node of Cooling Towers impact 


These relationships have been used to identify scenarios that could occur because of events in the process of 
assessing the impact of location variables through four ranges for contingency between 0% and 20%, as shown in 
Figure 3. These contingency values have been presented based on the author's experience as proof of concept. 
Further research will be required to refine these contingency values. 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


Cooling towers Technical labour sh... Oto5percent 5to 10 percent 10to15 percent 15 to 20 percent 


Low Low 100 0 0 0 
Low Medium 50 50 0 0 
Low High 0 0 100 0 
Medium Low 50 50 0 0 
Medium Medium 0 50 50 0 
Medium High 0 0 50 50 
High Low 0 50 50 0 
High Medium 0 0 50 50 
High High 0 0 0 100 


Figure 3: Conditional probability node of Contingency for Mechanical 


5.3 Performing calculations 
The Bayesian network conditional probabilities were calculated using Netica software (Ni et al., 2011). This 
resulted in a functional and working network being developed to assess the impact of location variables. After 
calculations, the results of the conditional probabilities were established, as shown in Figure 4. 
Likelyhood 
Probability 


Trade Impact 


Low 29.6 jam 
Medium 25.2 Mmm tt | 
Hig! 45.1 P: 


0 to 5 percent 8.11 M; 
5 to 10 percent 16.0 W: 
10 to 15 percent 
15 to 20 percent 


0 to 5 percent 2.41 
5 to 10 percent 4.61 
10 to 15 percent 26.1 
15 to 20 percent 66.9 I 


Figure 4: Bayesian network identifying trade Contingencies based on conditional probabilities. 


403 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


5.4 Event scenario analysis 


An example of the updated impact of Cooling towers is shown in Figure 5. This event has been modelled on the 
node ’Cooling towers’. A 100% likelihood of this event occurring has been assumed as ’Unlikely’ in this 
hypothetical scenario. 


Likelyhood 


Probability 


Trade Impact 
Tachndenl lahoura horace Mechanical Contingen 


0 to 5 percent 14.8 
EAE a 5to10percent 27.4 
High 451 10 to 15 percent 35.2 


15 to 20 percent 22.6 


VeryUnlikely 
Unlikely 42.0 
Neutral 


Electrical Contingency 
VeryUnlikely Imported generators AA 2.41 


Unlikely Low 10.6 
Neal mai el oiai 204 
Neutral i High 60.7 x 


Like 15 to 20 percent 66.9 


VeryUnlikely 6. 


Unlikely s Low 8.98 
Neutral H Medium 3.04 
Likely I High 88.0 


VeryLikely 


Figure 5: Bayesian network updated with new probabilities impacting Mechanical contingency. 


In this scenario, we have also selected the node for a probability of an increase in cost to 'Medium.' Using this 
Bayesian network, this updated information has impacted the node for Mechanical Contingency, changing from 
15%-20%, as identified in Figure 4, to 10%-15%, as shown in Figure 5. Therefore, in this example, the impact of 
location variables has, using the Bayesian theory, identified an improved risk and reduced contingency for the 
Mechanical Works. 


6 CONCLUSION 


Using a combination of the Delphi study, Likert scale, risk, and Bayesian theory to evaluate the impact of site 
location on capital expenditure for Hyperscale Data Centres has been demonstrated to be a feasible approach. The 
study findings indicate that it is possible to identify the likelihood of specific location variables impacting capital 
expenditure by conducting a Delphi study to obtain expert opinions and utilising a Likert scale to acquire 
subjective information about the probability and perceived risk of occurrence. These probabilities can be 
integrated into Bayesian analysis as prior knowledge, and as new information becomes available, they can be 
updated to calculate the posterior probability. The resulting percentage impact can then be applied to assess 
individual or multiple items and incorporated into the total capital expenditure, providing a method for 
determining the percentage impact, cost increase, or contingency. The findings of this study have significant 
implications for evaluating the impact of location variables for Hyperscale Data Centres, where variables can be 
identified and quantified as a percentage variance to capital expenditure. By utilising a Delphi study, the method 
can gather expert opinions, increasing the reliability and validity of the data obtained. 
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Furthermore, using a Likert scale allows for quantifying subjective information, which can be challenging to 
measure using other methods. Finally, by incorporating the probability and risk of occurrence, the Bayesian 
analysis provides a more accurate assessment of the impact of location variables on capital expenditure. The 
methodology described in this study can be applied to various industries, providing a comprehensive framework 
for determining the impact of various factors on capital expenditure and informing decision-making processes. 
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ABSTRACT: This study describes a multi-year project aimed at digitizing the real estate assets of an Italian 
university, specifically the Politecnico di Milano. The objective is to enhance and streamline university asset 
management through the implementation of Building Information Modelling (BIM) methodology. BIM fosters a 
collaborative environment among stakeholders, facilitating the digitalization of asset management processes. The 
project focuses on modeling the university's assets in a BIM environment, for the creation of a repository of 
structured information that will streamline and optimize the processes related to the buildings’ life cycle. This 
initiative aims to enhance real estate management services, optimize space utilization, and ultimately elevate user 
satisfaction within the university community. The project commenced with an in-depth analysis of the technical 
areas within the university responsible for design, construction, redevelopment, and overall real estate asset 
management. Each of these areas was evaluated for strengths, constraints, and critical points. Various approaches 
to BIM integration were explored to enhance digitalization processes. Based on these initial assessments, a 
comprehensive set of methodological and operational guidelines was formulated, encompassing modelling, 
management, and data-sharing aspects of digitalization. This paper provides an overview of the initial phases of 
the project, highlights its strengths, and identifies areas for improvement and testing in future project development. 
Emphasis is placed on standardizing information to ensure consistency throughout the asset's entire life cycle. 


KEYWORDS: Building Information Modelling, Real estate management, University asset digitalization 
guidelines 


1. INTRODUCTION 


The digital transition in the Architectural, Engineering, Construction, and Operation (AECO) industry has emerged 
as a priority in national policies, thereby garnering attention from administrative bodies. The AECO industry, 
however, remains relatively under-digitized, leading to diminished productivity. This is exacerbated by the 
fragmented processes within the industry, resulting in information loss throughout project lifecycles (Succar, 2010). 


Frequently, the absence of accessible and up-to-date information underlies inadequate resource control in real 
estate asset management (Lauria et al., 2015; Meschini et al., 2022; Vivi et al., 2019). Incorporating operational 
strategies and digital tools like BIM into administrative procedures holds the potential to enhance data quality in 
decision-making processes, thereby fostering better-informed choices (Cacciaguerra et al., 2022; Derakhshan et 
al., 2019; Munir et al., 2019). 


BIM's capability to construct digital building models incorporating information from various project stages 
provides valuable data for operations and maintenance across the building's lifecycle. Moreover, it transforms the 
relational dynamics among stakeholders involved in the construction process (Eastman, C., Teicholz, P., Sacks, R., 
Liston, 2008). 


This research marks the initial phase of an extensive project undertaken by the Politecnico di Milano for asset 
digitalization, which started in 2021. The objective is to establish a BIM management system that enhances and 
streamlines real estate asset management through a digital transition process. 


The project standardizes procedures and processes through the introduction of BIM methodology, aligning with 
the Polytechnic's endeavour to digitize building documents stipulated in the University Strategic Plan. This is 
achieved through proprietary guidelines and operational protocols. The development of these documents emanates 
from an analysis of the current organizational model's state, task and workflow assessments, and interdepartmental 
interactions. Consequently, the project identifies needs and revamps the organizational model to cultivate a 
collaborative environment among contributors to real estate development and management, integrating procedures. 
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This project holds replicable potential: Politecnico di Milano's digital transition process could serve as a pilot case 
for other university campuses, given the shared requirements and challenges of similar Italian institutions. 


2. REFERENCE CONTEXT 
2.1 Government BIM implementation and Guidelines 


The role of governments in the successful implementation of BIM is crucial. Numerous studies have sought to 
evaluate implementation initiatives across various scales, investigating both limitations and drivers for adoption 
(Jiang et al., 2022). Challenges such as institutions unprepared for the market, high associated costs, and lengthy 
training periods have been identified as significant obstacles. Additionally, difficulties arise from a lack of clear 
standards and a structured approach to change management (Elmualim & Gilder, 2014). The adoption of BIM 
within construction processes remained sluggish until government institutions mandated its usage in public works 
projects. To achieve effective implementation, the legislative bodies must strategically plan and structure 
interventions to facilitate adoption. Many governments have championed initiatives aimed at advancing 
construction industry digitalization through BIM (Abdirad, 2017; Liu et al., 2015; Marocco et al., 2023). 


Recognizing the increasingly acknowledged benefits of the BIM methodology, a growing number of nations have 
adopted comprehensive implementation strategies. In the United States, a leading figure in this domain, the 
General Services Administration (GSA) has mandated the utilization of BIM in all projects since 2007, with 
guidelines and standards developed by the National Institute of Building Science (NIBS) (National Insitute of 
Building Sciences US, 2023). The formulation of tailored guidelines, aligning with client needs, facilitates the 
generation of coherent information-rich models, essential for effective BIM implementation within organizations 
(Di Giuda et al., 2017). This underscores the importance of both public and private institutions establishing 
customized guidelines during the initial stages of transitioning to BIM, ensuring clear objectives and actions 
throughout all project phases. 


Organizations equipped with well-defined guidelines can translate their requirements into precise information, 
thus ensuring the desired outcomes. Simultaneously, professionals and manufacturers can follow a structured work 
plan, fostering seamless collaboration and cooperation, and supported by standardized practices. 


2.2 Regulatory background 


The global and Italian legal and technical regulations governing BIM methodology draw upon references from 
both mandatory and voluntary standards. In the European context, Directive 2004/18/EC marked the initial 
directive, subsequently replaced by European Directive 2014/24/EU in 2014. Article 22 comma 4 of this directive 
emphasizes Member States' authority to "require the use of specific electronic tools, such as electronic simulation 
tools for building information or similar tools." This directive compels member states to incentivize, specify, or 
mandate digital tool adoption through dedicated legislative measures. In Italian law, this directive was adopted by 
Delegated Law n. 11 of January 28th, 2016, and subsequently confirmed by D.Lgs 50/2016, followed by its 
implementing decree (DM 560/2017), and the recent D.lgs 36/2023, which reinforces the significance of 
digitalization across the entire procurement lifecycle. This decree outlines the incremental, mandatory introduction 
of BIM in public procurement. Organizations are obligated to establish a comprehensive plan for staff training, 
hardware and software acquisition and maintenance for digital decision-making and information management 
processes. The decree underscores the importance of an organizational framework that articulates the control and 
management of information and related aspects. DM 312/2021 incorporates elements from the preceding decree, 
explicitly stipulating that models should be accompanied by decision-support workflows. Furthermore, this decree 
references the use of technical specifications in accordance with voluntary European technical standards (EN or 
EN ISO), international technical standards (ISO), and national technical standards (UNI). Notably, in the 
construction industry, relevant standards encompass the ISO 19650 Series on BIM-based information management, 
the UNI 11337 Series on digital construction information process management, the ISO 21500 Series on project, 
program, and portfolio management, the 55000 Series on asset management, and the ISO 9000 Series on quality 
management. 


3. DIGITALIZATION ROADMAP METHODOLOGY 


One of the principal challenges of a digital transition process is related to the need to implement a new working 
logic within practices that are not always regulated but are well-rooted and difficult to change (Ahmed et al., 2017; 
Barbosa et al., 2017) Moreover, the necessity to maintain information at the core of the construction process 


408 


throughout its different phases is a key aspect of BIM methodology and necessitates the practical application of 
concepts such as information sharing and standardization. This often becomes critical as the individuals 
responsible for managing information during the various stages of the building's life cycle are diverse and handle 
an array of disparate or non-uniform information. 


This research introduces the initial stages of the digital transition process within a public university, which began 
in 2021 and anticipated to be completed over approximately six years. It is segmented into three strategic work 
phases. The project's long-term objective is to execute the digital transition process for the digitalization of the 
university's real estate and its management procedures through BIM (Fig.1). The proposed methodology can be 
readily replicated for other university real estate assets. 


The project addresses the need to integrate the new digital work methodology into an organization with established 
processes while striving to strike a balance between coexisting established practices and implementation of new 
ones. To achieve this, a framework was established across three consecutive and interconnected phases, 
progressively integrating novel methods. Emphasis was placed on optimizing work processes and resolving critical 
issues primarily stemming from information loss during the transition from building design to management phases. 
Maintaining the structuring and management of information vital for the building management phase remained 
the focal point of the project. 


The first phase represents a strategic stage, during which strategic macro-objectives of the digitalization project 
were defined. The research context was examined, organizational processes were analyzed, and project-specific 
objectives were formulated, aligned with regulations for optimization purposes. This phase culminates in the 
creation of the Methodological Guideline, a strategic document that elucidates project objectives and the 
application context, outlines the document structure comprising the Guideline, and delineates the roles of involved 
stakeholders. 


The following phase, the application phase, focuses on delineating the information content of digital models 
through the development of operational documents. These documents outline modeling specifications, element 
hierarchies, information granularity, and asset attributes. The phase concludes with the initial implementation of 
case studies to validate guideline content. 


The ongoing third phase involves the practical integration of BIM into facility management, enabling 
comprehensive utilization. This phase necessitates comprehensive staff training to ensure the adoption of 
guidelines and their gradual integration into strategic projects, ultimately leading to their universal application. 


The described workflow, reflected in the hierarchical structure of the Guideline's documents, is characterized by 
three levels: 


- Methodological Guideline: The primary document defining the foundational principles underpinning the 
application of BIM methodology within the university's processes. 

- Operational Documents: These executive documents address specific concerns. 

- Project Operational Documents: templates for specialized documents mandated by regulations in 
contracts. 


Strategic Phase Operational Phase BIM implementation Phase 


Project si 
macro-objectiv 
definition 


Output Case studies BIM methodology Use of guideline 
definition definition testing training in full operation 


Validation and Guideline 
optimization ntroduction in 

strategic 

projects 


and reference 
standards analysis 


Methodological Operational Documents 
Guideline 
- BIMprocesses protocols 
- Hierarchical decomposition of 
elements 
Modelling protocols 
- Element attributes definition 


Fig. 1: Digitalization roadmap 


409 


3. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPI 


4. POLYTHECNIC OF MILAN CASE STUDY 


The Polytechnic’s real estate holdings are spread across six campuses situated in different cities, comprising a 
total of 117 buildings and covering an extensive area of 467,000 square meters. The university community 
comprises approximately 53,264 individuals who engage with and utilize these facilities in various capacities. 
These include students, researchers (including research fellows and PhD students), professors and technical and 
administrative staff (Fig. 2). 
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Fig. 2: Consistency of the real estate assets of the Politecnico di Milano 


The Politecnico di Milano is a large, complex organization and its management is quite challenging. To address 
this complexity, the institution has undertaken the development of the BIM Management System Standards. These 
standards have been devised while considering the regulations pertaining to quality, asset, and project management, 
as well as information management. While the adoption of these regulations remains partial within the technical 
domains and their associated activities, the overarching goal is to achieve comprehensive integration into all 
operational processes. Such integration is anticipated to yield intricate synergies, fostering streamlined operations 
while concurrently minimizing both effort and expenses. 


The following sections provide an overview of the stages completed to date in this ongoing process. 
4.1 Strategic and organizational Phase 
4.1.1 Organization chart and existing processes analysis 


Achieving a successful digital transition process requires integrating change with existing processes. Therefore, 
the initial phase of the process was dedicated to comprehending the organization and its operational logic, achieved 
through an in-depth examination of the departments involved in the university's building processes. 


At the organizational level, a dedicated BIM Task Force was established at the central university level to oversee 
the project. This Task Force, comprised of technical and research personnel, was specifically constituted for this 
purpose. 


By examining the organizational structure, the primary departments engaged across all stages of the building 
process were identified (Fig.3): 


- Technical Building Area: This office assumes responsibility for the strategic planning and coordination 
of real estate development, encompassing activities such as extraordinary maintenance, restoration, 
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rehabilitation, building renovations, and new construction. 

- Infrastructure and Services Management Area: This office tasked with maintaining university spaces, 
ensuring their livability, security, cleanliness, and the provision of services and resources essential for 
administrative operations. 

- ICT Services Area: This office manages the provisioning and administration of ICT services, facilitating 
the cohesive management of information in support of governance, administration, and all stakeholders. 
Its primary role within the process is to facilitate information management. 
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Fig. 3: Examination of the University Organizational Structure and Identification of Involved Departments 


The workflows of these offices were studied through the analysis of the documentation provided and a survey 
campaign, with the aim of learning about the procedures and interactions between internal and external parties and 
assessing possible problems. 


This phase of the offices analysis occurred in two cycles: 


- The first cycle involved conducting standardized interviews across all offices to identify characteristics, 
responsibilities, tasks, and structural facets. It further sought to analyse the means of communication, 
document exchange, and interactions among individuals, both within offices and across different 
departments and external entities. 

- The second cycle encompassed tailored interviews for each office, delving into specific inquiries 
pertaining to the adoption of information modeling, particularly during the design and management 
phases. 


The main critical issues highlighted by this analysis were: 


- Difficulties in collaboration between offices. 

- Inadequate management of the document exchange from construction to building management phases. 
- Unclear delineation of roles and responsibilities throughout distinct stages of the construction process. 
- Absence of standardized storage systems. 
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- Inaccurate or incomplete building information. 

- Lack of uniform and shared coding across offices of spatial and technological-functional elements. 

- Absence of protocols and technical specifications for the transition from the design phase to the 
management phase. 

- Lack of a definitive list detailing elements to be maintained and their associated attributes. 


All existing processes were then mapped and diagrammed through Business Process Model and Notation (BPMN), 
an internationally recognized open standard that provides a graphical notation for specifying an organization's 
processes. This graphical notation system has gained prominence within the AECO industry due to its capacity to 
simplify comprehension between individuals with diverse backgrounds enhance interoperability. The schematic 
representation of processes promotes clearer apprehension of tasks, roles, and information exchanges, and 
consequently, information requirements (ISO 19560) are defined. In that way, processes are translated into a 
computing language, making possible future automation (Fleischmann et al., 2012; Meschini et al., 2023) 
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Fig. 4: Exemplification of a BPMN extrapolation illustrating an analyzed process and its redesign proposal 
through BIM. 


Processes have been classified into three categories: 


- Main processes: These encompass workflows that involve activities spanning multiple departments. They 
relate to tasks such as designing and constructing new buildings, revitalizing existing structures, and 
initiating property management. 


These can be represented in: 


o Process models: sequence or flow of activities with the goal of accomplish a task; 
o Collaboration models: a set of processes that work together for a purpose and are individually 
referred to as actors involved in exchanging information. 
Main processes may contain sub-processes and recursive sub-processes, which may be process or 
collaboration models: 
-  Sub-processes: These represent processes identified within main processes. 
- Recursive sub-processes: These further delineate processes within main processes, outlining a series of 
activities conducted between departments. These sub-processes can be invoked within various higher- 
order processes. 
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After sistematically analyzing existing processes, a proposal to reimagine these processes using a BIM-based 
approach was formalized. This proposal underwent subsequent rounds of review, revisions, and validation by the 
offices that had been previously investigated (Fig.4). 


Furthermore, an updated organizational chart was devised, introducing new roles within each department. These 
additions catered to the responsibilities of individuals resulting from the integration of BIM into processes, as 
mandated by technical regulations. 


4.1.2 Defining the specific goals and Organizational needs 


The specific objectives of the project were established by deriving insights from the strategic objectives defined 
within the transition process framework. This process was further informed by an analysis of the structure and 
evidence of critical issues uncovered during the process study. 


This was supported by the BPMN flows study, which examined the path and importance of each piece of 
information throughout the building process, from the design phase (Technical Building Area) to the management 
phase (Infrastructure and Services Management Area). 


The needs of the new building asset management system were identified with respect to the needs of each of the 
previously identified offices, with the aim to solve the highlighted critical issues. In particular, the new building 
asset management system must ensure collaboration among the subjects by optimizing building management 
through coordination among various project disciplines. The need to collect information consistently despite 
different data collection occasions must be considered to ensure reliable and up-to-date information in a single 
BIM repository of as-built BIM models (Fig.5). 
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Fig.5 Goals of the New Building Management System 


This new system is characterized by its focus on shaping a dataset of information tailored to enhance real estate 
management. Through this strategic emphasis, the project aims to forge a comprehensive and coherent solution to 
the challenges identified within the organization's building processes. 


413 


4.2 Operational Phase 
4.2.1 Define the information requirement 


Defining information content is essential for proper asset management through BIM methodology. Within the 
Politecnico di Milano Guideline was based on the guidance provided by the UNI EN ISO 19650 and UNI 11337 
series of standards. 


Central to this initiative is the concept that information should be produced and retained with a distinct purpose. 
The process of defining information requirement revolves around outlining the specifics of the information to be 
generated and preserved, all with the overarching goal of ensuring that whatever the stakeholder's role, the 
subsequent objective is achieved seamlessly. Organizations able to govern the management of information during 
the processes allow the progress of both internally and externally coordinated operations to be made fast and 
uninterrupted, obtaining only the products they need, avoiding unhelpful material, and guaranteeing the quality of 
the information obtained. 


Therefore, it is essential to establish a structure capable of managing requirements to ensure the effectiveness of 
data production, collection, and exchange throughout the process. Client awareness regarding the value of 
information content also clarifies client requirements for various aspects, such as production and sharing methods, 
delivery times, and formats. Furthermore, this awareness supports the development of verification and control 
methods. 


The approach to defining information content for the Polytechnic is based on three concepts: 


- Information needs level: detail with respect to quality, quantity, and granularity of data to be adopted to 
define information related to a purpose. 
Information that makes explicit the level of information need is divided into: 
= Geometric information 
= Alphanumeric information 
= Documentary information 
- Granularity: degree of subdivision and specification of information management levels; 
- Data aggregation strategies: ways in which data should be aggregated or disaggregated. 


The process of defining information requirements for the real estate assets of the Politecnico di Milano was 
motivated by the need to address critical issues identified during the study of information flow throughout the 
processes. Specifically, the investigation focused on the issue of data collection's relevance to the maintenance 
phase under different circumstances. 


One of the issues found was the mismatch of elements from the design phase to the maintenance phase. This 
problem was leading to a mismatch between the data entered into the archives during the design phase and those 
entered during the maintenance phase, resulting in the need to re-search the information with a consequent loss of 
time. This lack of correspondence, combined with the absence of an unambiguous listing of assets to be maintained 
and the sharing of related management information requirements, led to the inability to record the related 
operations performed on the items, resulting in the impossibility of timely control their maintenance status and 
standardization of contracts. 


The subsequent focus of the endeavor was on establishing a unified and shared asset list for the As-Built and 
maintenance phases. To achieve this objective, the technological-functional elements that were previously 
employed by the Technical Building Area (a Project Breakdown Structure that segments the project into its 
technological elements used in the design phase) were examinated. Subsequently, we formulated an inventory of 
homogeneous objects that amalgamate various technical elements related to maintenance contracts. This endeavor 
culminated in the creation of a new asset list that seamlessly binds these components together (Fig. 6). 


Through the establishment of this unified asset listing, bridging the technical aspects of the design phase with 
maintenance-oriented elements, the attributes requisite for each identified element during both design and 
maintenance phases were effectively defined. This meticulous approach ensures the determination of the minimum 
essential information required for each phase. 


The level of information detail required will vary during different phases of the building's life and will be managed 
to streamline the recording of essential data, optimizing the time dedicated to information census and modeling. 
At the same time, the quality of the information entered into the system must be excellent, with the goal that it will 
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always be completely reliable. 


To ensure cohensiveness across all information derived from models, at whatever stage of the asset life cycle these 
were produced, the operational document "Modeling Protocols" was formalized. This is a document that makes 
explicit the rules and guidance necessary for the development of information models within the University's BIM 
Management System. The document aims to ensure a defined and shared BIM model structure. 
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Fig. 6 Relationship between hierarchies of elements in the design and in the maintenance phases 


The requirements underpinning the regulation of BIM Modeling in the document, which is based on the relevant 
legislation, are (i) the definition of a clear and shared structure, (ii) the need for continuity between pre-existing 
processes in terms of models and software, and (iii) streamlined models that can be used and reworked efficiently 
while ensuring the exchange of information. 


The protocols are structured with consideration of three information modeling scenarios that the Organization may 
face: 


- new construction interventions. 
- representation of the existing. 
- existing redevelopment interventions. 


The paper explicitly outlines the technical characteristics models must possess to ensure consistency within the 
management system. A particularly important aspect of ensuring consistency between model elements is the 
mapping process, which relates elements to: (i) the Product Breakdown Structure of the assets previously 
illustrated, (ii) the categories of the modeling software chosen by the university, (iii) and the related entities for 
exporting the models in open format IFC (Industries Foundation Classes) ensuring interoperability (Laakso & 
Kiviniemi, 2012) (Fig. 7). 


Throughout the transition project, we identified software to be used for information management based on various 
purposes. Rearding modeling, information collection, and management in the later phase, a comparative 
evaluation of FM and CDE software was conduced to determine the most suitable ones at the technical level 
according to the University’s requirements. 


415 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


To validate the correctness of protocols and decisions in the operational documents, two pilot buildings were 
modeled following the guidelines. Pilot cases were selected to test an already existing building and a new 
construction project, ensuring that each. 
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Fig. 7 Example of boiler element mapping 


5. FIRST RESULTS DISCUSSION AND FURTHER DEVELOPMENT 


To date, the digitalization project of the Politecnico di Milano organization is an ongoing process expected to be 
completed by 2026. The first two phases of the project have concluded successfully with the approval of the 
Methodological Guideline and Operational Documents, detailing specific features of information modeling. These 
documents were validated through the modeling of two case studies to verify the accuracy of the decisions made. 


The University Governance intends to base new procurements addressing the university's space needs on these 
approved Guideline documents. This approach ensures that new construction adheres to defined rules, gradually 
populating the university's BIM repository with consistent information. The ability of external parties to use the 
Guideline will enable the research team to validate comprehension and correctness in data reception, addressing 
and resolving any issues through updated document versions. Therefore, establishing protocols for internal data 
verification is essential. 


The subsequent phase of the digital transition project involves the actual implementation of the BIM methodology 
within the organization. This phase commences with comprehensive training for the offices on the use of the 
Proprietary BIM Guidelines and compliance with the new standardized procedures. This training ensures their 
assimilation and prevents the distortion of client requirements in terms of information content over time, which 
could render the entire data processing process ineffective. 


6. CONCLUSIONS 


This work explored the implementation of a digital transition process for the property management of a large 
university. The application research was structured in three progressive phases, aiming to optimize the 
management of the information flow through process implementation, resulting in improved management of the 
entire building process while structuring information essential for the real estate management phase. 


The specific focus was on defining a consistent information set derived in part from existing processes. The 
implementation addressed critical issues related to the loss of useful information during the transition from the 
design phase to the management phase. The definition and subsequent verification of information requirements by 
the organization are essential for properly aggregating and disaggregating information as needed. 
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For this type of process to succeed, the organization must possess proprietary Guidelines aligned with its processes 
and requirements while internally training its staff to recognize the importance of maintaining consistent 
information content. 
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ABSTRACT: The integration of Building Information Modelling (BIM) and Geographic Information System (GIS) 
with Business Intelligence (BI) is promising for managing vast and diffused assets. It enables valuable insights 
into asset performance and resource uses, supporting savings, improved efficiency, and sustainability. The 
research proposes a web-based Asset Management System application (AMS-app) via BIM-GIS-BI integration, 
providing an updated digital representation of university assets by combining spatial, performance, and operation 
data with related analytics. The AMS-app was developed in the context of the University of Turin's strategic plan 
as a pilot case to improve asset management procedures through a data-driven approach. Indeed, campuses are 
complex assets managed by multiple actors through still document-based and fragmented databases, often leading 
to ineffective and untimely decision-making processes. The AMS-app represents a valuable decision support system 
for facility managers aimed at asset monitoring and user experience improving through better and more 
sustainable decisions concerning space, occupancy, and indoor environmental quality (IEQ). To demonstrate the 
effectiveness of the BIM-GIS-BI integration through the AMS-app, several case studies were implemented with the 
following objectives: (i) the digitalization of university building data, (ii) the optimization of courses timetables 
according to space availability, (iii) the optimal workstations management, and (iv) the analysis, monitoring and 
optimizing of IEQ and comfort via IoT networks. The paper illustrates the advantages and applicability of the 
developed methodology through the case studies, and further developments in university asset management. 


KEYWORDS: Asset Management System, BIM-GIS integration, Business Intelligence, Information management 


1. BACKGROUND AND MOTIVATION 


University assets, especially Italian ones, are characterized by strong management complexity due to their diffused 
buildings, often built in different eras with various construction technologies and high heterogeneity. The 
management is often based on fragmented, incomplete, and hardly accessible databases, preventing the correct 
definition and optimization of usage patterns, as well as the normalization of management processes (Qian and 
Papadonikolaki, 2020). This results in inefficiencies in services and maintenance activities, leading to wasted 
resources and inefficient decisions concerning the expected performance, user comfort, and economic and 
environmental sustainability. Thus, university assets represent a crucial opportunity to propose a solution to the 
information gaps currently found in managing large assets. There is an increasing need for the adoption of 
information management strategies aimed at shifting from highly document-based and fragmented approaches to 
digital and collaborative ones (Chen et al., 2015). Digital tools and effective information management strategies 
enable data integration, ensuring the availability of accurate information with various granularity levels, at the 
right time, in the required formats, and throughout the asset lifecycle. Despite its application in asset management 
is still rare (Moretti et al., 2021), BIM-GIS integration provides high potential (Liu et al., 2017; Beck et al., 2020), 
especially in borrowing the Smart City concept at the Campus scale, improving the management of such complex 
assets for a better user experience and optimal resource utilization (Lu et al., 2020; Ward et al., 2021; Wang et al., 
2019). BIM enables the development of highly detailed building information models, while GIS allows their 
management and analysis through a global spatial reference system (Zhu et al., 2021). BIM-GIS integration 
combined with BI tools can be exploited to optimize the management of large assets and to foster the development 
of AMS tools as concrete decision support systems (Pärn et al., 2017). The further integration of IoT and digital 
devices can facilitate data collecting, providing a better maintenance and asset management through the monitoring 
and analysis of real-time data about asset performance and condition, enabling timely interventions (Wong et al., 
2018). 
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The research project exploits BIM-GIS integration with BI tools to develop an interactive, web-based 3D map 
(AMS-app) for the management of large and diffused university assets. The main objective consists in facilitating 
information management and decision-making processes by improving information accessibility and sharing 
among stakeholders, and normalize management processes. The paper illustrates the replicable methodology 
developed to define and implement the AMS-app within the management system of the University of Turin (i.e. 
UniTO), an emblematic case for testing and demonstrating the potential offered by such a decision support system. 
Then it describes how data are collected from various siloed databases and integrated, providing an easily 
accessible and implementable knowledge base. Indeed, the AMS-app collects all the data currently handled 
separately by different administrative offices, providing a still independent but collaborative management system. 
Thus, the asset can be managed at the system level, rather than at the level of individual isolated buildings. The 
potential that such an AMS-app can provide to asset and facility managers of large university assets is described 
through the illustration of several case studies implemented so far, selected based on the management needs 
encountered within UniTO. Finally, the results are discussed, and the potential future developments and 
implementations are reported. Further objectives concern the development of information modelling protocols and 
guidelines to facilitate the adoption and transition to such a digitalized and shared management approach. In this 
way, the data can be modeled and structured ensuring the availability of accurate information at the right time, in 
the required format ,throughout the asset lifecycle, and the method can be easily replicated. 


2. METHODOLOGY 


The main steps of the methodology adopted to develop and implement the AMS-app in the university 
organizational structure are illustrated in the following paragraphs. 


2.1.1 | Main objectives and needs definition 


The first step concerns the definition of the main objectives and needs that the organization intends to handle via 
the AMS-app. The organizational structure was analyzed to understand the management issues faced by UniTO 
and to define the relevant case studies to be implemented and tested through the AMS-app. Consequently, meetings 
and interviews with the managers of the technical areas in charge of the university asset management, its 
maintenance, information systems management, as well as teaching and educational services were conducted. 


2.1.2 Current database structure and dataflow management strategies investigation 


An investigation of the procedure currently adopted by the university to manage the information flow has been 
conducted to identify which information and data are handled by the different technical areas, as well as the 
methodology used to produce, store, and exchange them. One of the main aims of the analysis concerned the 
identification of the tools and formats currently exploited by the areas so that the AMS-app could be developed 
without disruptively changing current workflows, facilitating its adoption. 


2.1.3 Organization information exchange and information requirements formalization 


Once identified the organization's current database structure and dataflow management strategies, the analysis and 
formalization of the information exchange among the technical areas have been performed. The main aim was to 
ensure the easy integration of modifications occurring over time, providing a constantly updated digital 
representation of the building asset. The information exchanges were formalized to ensure that the technical areas 
can continue to rely on current tools and procedures with minor changes in data production and management. Then, 
the Information Requirements (IRs) have been defined to foster communication and support the creation of a 
coherent database to feed the AMS-app. Table 1 provides an example of an exchange information requirement 
schedule. 


Table 1: Example of an exchange information requirement in the standardized form for data collection. 


Field Data source Note Type 

Building code OpenSIPI, Technical Areas: EDISOS, Building coding (Settlement_Building: e.g. Text 
SILOM, SIPE, Asset Management 029 B) 

Building name OpenSIPI, Technical Areas: EDISOS, Free field Text 


SILOM, SIPE, Asset Management 


Main address OpenSIPI, Technical Areas: EDISOS, Free field Text 
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SILOM, SIPE, Asset Management 


Municipality OpenSIPI, Technical Areas: EDISOS, Extended name, es. Turin Text 
SILOM, SIPE, Asset Management 


Main use OpenSIPI, Technical Areas: EDISOS, Free field Text 
SILOM, SIPE, Asset Management 


Building Type OpenSIPI, Technical Areas: EDISOS, Bound field: Building, Portion of Building, Text 
SILOM, SIPE, Asset Management Agglomeration of Buildings, No Type, ND 

Status OpenSIPI, Technical Areas: EDISOS, Restricted field: Decommissioned, In use, Under Text 
SILOM, SIPE, Asset Management construction, No Status, NDo 


Furthermore, a possible change in the university's organizational structure was investigated for better supporting 
the AMS-app adoption. 


2.1.4 AMS-app development: approaches, technologies, and tools selection 


As stated before, the scientific literature has widely discussed how BIM and GIS integration can bring significant 
benefits in the field of asset management. BIM enables an overview of a single building and GIS allows the 
contextualization of each building, enabling analysis at a territorial level. Thus, the two resources optimally lend 
themselves to the university campus management, which, given its complex infrastructure, the amount of 
heterogeneous data, and spaces spread over a vast territory, needs systems that can facilitate the management, use, 
and maintenance of their assets at different levels. The proposed system integrates these resources providing the 
necessary tools for understanding, visualizing, and analyzing information related to the university building stock, 
its services, and infrastructure, thanks also to the support of BI tools. Data are core to the system, populating it and 
being the key to the different components’ connections. The platform integrates data of various nature and from 
multiple sources: geographic data, geometric and functional data, as well as data concerning IEQ derived from 
sensors (Table 2). 


Table 2: Data sources, information and types. 


Source Information Type 

Piedmont Territorial Building location, heights, geometries, restrictions Text, numbers, coordinates, shape files. Static 
Geoportal data. 

OpenSIPI Name, encoding, address, floor numbers, state of use, Text, numbers, drawing. Static data. 


geometries, area, department assignment 


Department offices Courses schedule, personnel employed information, Mainly text. Sheet-form organization. 
buildings construction site Dynamic data (on long term) 

University website Timetables, organizational units, building property state and Mainly text. Sheet-form organization. 
expenses Dynamic data (on long term) 

IAQ Platform Environmental sensors measurement (CO2, humidity, Numbers. Dynamic data (on brief term) 


temperature, VOC, PM2.5, etc.) 


The different datasets are then collected from various sources, processed, and stored in a cloud repository so that 
data can be queried by any operator without duplicates, errors, or information loss. Aiming at the optimal 
integration between the different data, information sheets have been prepared and provided to the technical areas, 
asking for their fulfillment. Nonetheless, their compilation is not always possible such as in the case of geometric 
data derived from drawings or models, or geographical data from cartographic services such as the Geoportal of 
the Piedmont Region. In these cases, the sheets have been compiled manually by the authors. 


Once the needed data have been collected, they are interlinked thanks to the semantic association of encoded 
names. An encoding system was defined for each city, building, and space, which is part of the university asset, 
starting from the one currently used by the university’s administration to promote a smooth integration process. 
The encoding system was also key for data association in GIS. The entire campus buildings were identified, 
geolocated, and associated with their encoded names. The map was developed in a 3D view, aiming at offering a 
better perception of urban space, asset consistency, and distribution. Thanks to the geolocation of the encoded 
buildings through a 3D environment, and to the association with functional data, various analyses were developed 
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at the territorial level, providing the depth knowledge of the university asset with information on its use and 
consistency, useful to support decision-making processes. 


QGIS was selected for the shape files creation and first population, while Mapbox was exploited for the 3D map 
development and visualization. It was chosen for its easy integration with Microsoft Power BI, thanks to the 
opportunity to generate a customized web map based on the created shape files, shareable through URL and carrier 
of selected information. In addition to the GIS representation of the university asset, BIM was exploited for the 
analysis and visualization of each single building at different levels. The modeling phase succeeded in exploiting 
the Autodesk Revit Software for geometrical construction and the Visual Programming Language Application 
Dynamo for information population. The BIM modeling and the Dynamo nodes were developed with the aim of 
generating a replicable workflow, adaptable at each building of the asset. So, each BIM model presents three 
different levels of consistency: the building volume, the building levels, and the building spaces, modeled as mass, 
floors and rooms, respectively. Each element is firstly associated with the corresponding code, then the encoded 
name and the Revit Element ID are exported for each geometry modeled. The encoded name association ensures 
the association between the different datasets, but the connection between the BIM geometries and the information 
occurs via the Revit Element ID association. It is needed for various reasons, for example, some rooms may 
temporarily present the same encoded name during a space renovation, or their geometry can change throughout 
the building lifecycle, as well as their name. Despite the mutability of information such as the name, the Revit 
Element ID remains unchanged over time. 


The BIM model is then associated with information related to the standard use of the building and its characteristics. 
During parameters’ creation, a parameters comparator enables to individuate the shared parameter between masses 
floors and rooms, avoiding repetition that could lead to association errors or can compromise the final model 
quality. Both the Revit parameters and the associated values are extrapolated by the same dataset, stored in the 
cloud repository, and in the future, each building of the university asset will have its own datasets for better 
management. At this stage, the association of the most dynamic information, such as the real-time collected data, 
has been avoided. In this way, the model can represent the basis for multiple analyses related not only to 
environmental quality but also to space occupancy or educational and working spaces management. This choice 
provided also lighter models, with smaller file sizes and greater smoothness. Finally, with the aim of facilitating 
BIM and Microsoft Power BI integration, the BIM model was exported with Proving Ground Tracer. This software 
allows the exportation of both 2D and 3D geometries with related information, all preserved in an SQL database 
generated by the software itself. At the end of the process, the BIM is then ready to be imported and visualized in 
Microsoft Power BI. It demonstrated great capacity in data analysis and visualization, allowing information sharing 
between different stakeholders through analytic dashboards which can involve data, BIM models, and GIS maps. 
So, it represents the preferential software for web-app structure development. 


2.1.5 Data visualization, analytics, and dashboard structure definition 
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Fig. 1: UniTO’s asset management system platform based on BIM-GIS-PBI integration 
Power BI was chosen not only for data analysis and visualization but also as the software to generate the AMS- 


app main structure. The dashboards created are directly connected to datasets stored in the cloud repository and 
can be integrated with BIM and GIS maps. Thanks to these connections, based on the relationship of the encoded 
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names, it’s possible to associate the resources, the data previously excluded, such as the dynamic data related to 
real-time environmental quality measurement, the course timetables, the occupancy rates, etc. The platform is 
structured so that the dashboards are gathered in thematic reports (asset consistency, economic evaluation, 
construction sites, IEQ, spaces’ occupancy, etc.), classifiable also into territorial and building reports, depending 
on the level of information visualizable. Each report is correlated to the others based on a predefined structure (Fig. 
1), but still independent, allowing to introduce different accessibility levels according to the requesting user. 


3. RESULTS AND DISCUSSION 


The proposed methodology and AMS-app are being applied as a pilot study to the UniTO asset management, as 
described in the following sections. 


3.1 UniTO pilot case 
3.1.1 Main objectives and needs definition 


UniTO aims to create an integrated AMS-app that enables updated data visualization to optimize the management 
of its diffused asset, providing information about performance and resource utilization, to promote cost reduction 
and improve efficiency. The main goal consists in improving the management of spaces and resources. The AMS- 
app enables to identify and monitor the asset over time, leading to better decision-making concerning space, 
occupancy, and IEQ management. It was applied to digitalize the whole UniTO building stock, providing an overall 
view with data about its consistence and usage (i.e., geolocation, building asset consistency, geometrical and 
financial data, building performances, rooms capacity, equipment, and performances, occupancy level, and usage, 
etc.). The updated visualization of the building portfolio through a 3D map with data and information handled 
through GIS, BIM, and BI systems aims to support the: 
e Management of university facilities at the territorial and building levels allowing to produce data analytics 
at different layers for improved facility operation; 
e Optimization of university teaching timetables based on the actual teaching space availability and 
capacity; 
e Optimization of workstations management by introducing remote working strategies; 
e Analysis and optimization of IEQ to improve users’ comfort and safety through IoT monitoring. 


3.1.2 Current database structure and dataflow management strategies investigation 


Currently, the university technical areas handling data and documentation about facilities, spaces and related 
equipment, performance, capacity, workstations' number, and concerning the teaching timetables’ definition, rely 
on a document-based system characterized by siloed information. Data about spaces and related usage are stored 
and exchanged via semi-structured formats such as .xls or .csv. Information exchanges take place mainly via 
traditional communication systems (printed reports, e-mails, calls, in-person or remote meetings, etc.), and data 
are not shared between stakeholders or administrative offices, resulting in struggling or absent analysis of 
integrated aspects and information to support the decision-making processes. Thus, a data integration system is 
proposed with the aim of creating an updated tool collecting data from several sources, shared among the whole 
UniTO staff. The different data silos are integrated by exploiting BI tools and methods, enabling to collect and 
integrate data from the various Excel sheets produced by the technical areas. In this way, it is possible to maintain 
the current processes adopted to manage space, staff, and usage data, avoiding disrupting the management 
procedures. Furthermore, it is possible to avoid the effort of creating a structured DB (e.g., a relational SQL 
database) that would have been more disruptive and difficult to integrate within the UniTO management system. 
This allows for the gradual and low-impact integration of the AMS-app and the system, with a greater likelihood 
that it will be concretely used by UniTO staff. 


3.1.3 Organization information exchange and information requirements formalization 


The data integration system has been set up by a cross area within the UniTO organization represented by a research 
group of seven people which acts as a data analysis area. The authors are part of this research area that collects the 
different data from UniTO areas integrating them through the proposed system (AMS-app) according to the 
schema illustrated in Figure 2. 
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Fig. 2: Information exchanges and data processing schema within UniTO organization. 


Data are retrieved from UniTO technical areas via standardized forms (Table 3), and the data analysis area receives 
information concerning buildings, spaces, capacity, occupancy, people occupying the spaces, etc. through similar 
forms. This enables quick data management via structured data sheets and avoids the cleaning and processing of 
data first. The integrated data, the resulting information, and knowledge are made available and displayable via 
the platform, enabling scenario-based analysis to support the decision-making processes of the UniTO's asset 
management staff. 


Table 3: Example of standardized form for collecting data about some of UniTO facilities, collected in the. 


Building Building name Main address Municip Main use Building Status 
code ality type 
001_A Palazzo del Rettorato Via Verdi 8 Torino "Uffici a supporto didattica e Fabbricato Tn uso 
ricerca 
020_A Palazzo Nuovo Via Sant'Ottavio 20 Torino Dipartimenti / Biblioteche / Aule/ Fabbricato In uso 


Uffici a supporto didattica e ricerca 


021_A Palazzetto Aldo Moro Piazzale Aldo Moro Torino Uffici a supporto didattica e ricerca Fabbricato In uso 
/ Aule / Dipartimenti 


029 B Campus Luigi Einaudi Lungo Dora Siena 100 Torino Dipartimenti / Biblioteche / Aule/ Fabbricato In uso 
Uffici a supporto didattica e ricerca 


032_A- Centro Pier della C.so Svizzera 185/Via Torino Dipartimenti / Aule / Magazzino/ Porzione di In uso 
B-C Francesca Pessinetto 12 Rimessa / Uffici a supporto fabbricato 
064_A Torino SUISM Piazza Lorenzo Torino Aule/Uffici a supporto didattica e Fabbricato Dismesso 
Bernini, 12 ricerca 


Specific applications of the proposed data integration system and AMS-app are described in the following sub- 
sections, including advantages and limitations. The case study selected as a demonstrator for such applications at 
the building level is the Campus Luigi Einaudi (CLE). The main facility of CLE is located in the northeast area of 
Turin, with a total net area of more than 36,000 square meters. The facility hosts the Department of Law, the 
Department of Political and Social Sciences, and the Department of Economics and Statistics "Cognetti de Martiis", 
hosting more than 500 research fellows. In addition, there are numerous administrative areas with around 100 
administrative employees dealing with the management of spaces, people, contracts, and related bureaucratic 
matters. Furthermore, the CLE has 47 lecture halls and hosts around 16700 university users, so it represents a 
significant case study with a large catchment area and many activities within it. 


3.2 Real estate asset management at territorial level 


Real estate asset management at the territorial level deals with the storing and managing of data concerning the 
location, the type of building property (e.g. owned, rented from another institution or a private subject, partially 
owned or partially rented, etc.), the overall occupancy, the presence of listed buildings, and other facility 
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management data. This kind of data regarding UniTO facilities currently are stored in siloed documents or .xls 
files, preventing the integrated analysis needed to support decision-making processes. Thus, overall dashboards 
regarding the whole UniTO's real estate property are produced to investigate multiple data at once. The dashboard 
maps provide an overview of the entire territory over which UniTO buildings are distributed, useful to investigate 
data at a territorial level. All the views provided on the dashboard pages (e.g., maps, charts, graphs, and cards) are 
dynamic and interactive. A selection in one view, acts as a filter for all other views of the page or of the entire 
dashboard report, according to the specific requirements. The dashboards at the territorial level support the 
decision-making processes regarding the strategies to be pursued at a high level on the whole university facilities. 


Figure 3 shows the dashboards at the territorial level with general data regarding UniTO real estate propriety. Other 
information, such as UniTO facilities under refurbishment and data concerning specific interventions or the listed 
buildings are visualized in other tailored dynamic dashboards, allowing an integrated analysis of the data and 
supporting complex decisions of the UniTO's administrative areas. 


@ Comune Denominazione immobile Spa 


5 = 143 16 4464 159,49K 
: T 
ga~ 5 v S 
aN 4) x : r a 


Denom@rancne Immotte = Indrizte p Numero Local Aree 
ware do Len Via Po 29-31-33-35-3 391 n + . ‘ 
enzz> Rete Va Vero 9 oo 4 ; = 
Be Via Verd 10 . | 
m i Li 
2 37 0 a] 


Fig. 3: AMS-app dashboard at territorial level regarding general data of UniTO real estate propriety. 
3.3 Lecture hall spaces and teaching timetable management 


Managing teaching spaces and, specifically, organizing lesson timetables, poses a multifaceted challenge. The 
problem is intricate due to the numerous and diverse variables involved, especially concerning CLE. Notably, the 
presence of various departments and their distinct methods of allocating the timetable adds to the complexity. 
Additionally, the extensive range of teaching hours dedicated to different subjects and the diverse nature of 
activities carried out further necessitate a highly varied schedule even within the same building. 


The current process for creating class schedules and allocating spaces seems to be structured in stages, with three 
distinct directorates/offices involved. Firstly, the “Educational Services Directorate” supplies information 
regarding course enrolments, course codes, and academic credits assigned to each course. This data is then passed 
on to the “Degree programs office” which operates under the school's guidance, serving the departments and is 
tasked with creating the teaching timetable. Finally, a local branch of the “Building Logistics and Sustainability 
Directorate” is responsible for space allocation based on the received teaching timetables. 
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The entire process is characterized by fragmented data handling and manual switching between different software, 
which could increase the risk of errors. Thus, the primary objective was to consolidate the data into a single data 
analysis platform, ensuring more secure management and facilitating the analysis of teaching space utilization. To 
achieve this goal, a dashboard was developed, providing a comprehensive view of various data that would 
otherwise require gathering from three different offices. This single dashboard allows users to access information 
about the teaching spaces, including classroom names, room codes, capacity, classroom equipment, and net area. 
Moreover, the dashboard also offers insights into space utilization and teaching hours, enabling the estimation of 
the percentage of hours during which classrooms are booked, optioned, or available for use. This centralized and 
user-friendly approach streamlines data access and analysis, enhancing overall efficiency and accuracy in 
managing teaching spaces. Figure 4 illustrates the flexibility of viewing rooms in both 3-dimensional and 2- 
dimensional formats. Regardless of the chosen view, users have the option to select specific rooms and access 
detailed data. The interactive line graph enables direct interaction with the data, and users can also apply filters 
based on the building's floors. At the room level, the scheduled time for each room throughout the week, month, 
or year can be displayed, as depicted in Figure 5. 
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Fig. 4: AMS-app dashboard at building level with the analysis of lecture hall spaces and occupancy. 
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Fig. 5: AMS-app dashboard at room level allowing the analysis of lecture hall spaces, occupancy and timetable. 
3.4 Realestate asset management at building level 


Real estate asset management at the building level refers to multiple activities necessary to operate a facility and 
can include the following: people management including contract management, space and equipment allocation; 
space management including space allocation to the departments and areas hosted in each building. As of now 
building operation at UniTO is performed by the same departments or administrative staff hosted in each facility. 
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Except for space codes, the structure of which is common to all UniTO administration, all other space management 
information is managed locally by each building's space management staff. There is no central structure that 
manages all UniTO facilities at the space level. 


The main objectives of this application are to support the decision-making processes regarding UniTO facilities, 
collect all the necessary information, and produce dashboards at the building level to investigate space, occupancy, 
and people management via BIM-GIS integration and BI technology. The data that are treated are the following: 


e Space data regarding the location inside the building, area, and space typology; 
Allocation of spaces to the departments or administrative areas; 
Allocation of spaces to specific people and their role inside UniTO, contract typology, affiliation to a 
department or administrative area. 


Figure 6 and Figure 7 show the AMS-app dashboard at the building level regarding all the spaces of CLE and the 
specific analysis of occupancy and staff allocation in the offices respectively. 


Figure 6 allows the analysis via dynamic maps and charts of the spaces in terms of space typology and number of 
rooms, the allocation of the spaces to the departments or administrative areas, and single data points concerning 
the space net area and the number of rooms. The page allows the overall analysis of the building and the 
consultation of the general data regarding spaces, as well as the distribution of the space typologies on each 
building floor and the summary of the total building net area and number of rooms. 
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Fig. 6: AMS-app dashboard at building level allowing the analysis of space features and allocation of CLE. 


Figure 7 enables a more specific analysis of the offices at CLE, including the percentages of area and staff assigned 
to each department and administrative area. A graph allows the comparison among the employee and research 
fellows allocated to the spaces and the maximum number of people that can be hosted according to the net area of 
each room, supporting the decisions related to the new staff that can or cannot be allocated in a space. In addition, 
an equal distribution of the spaces among departments and areas can be ensured based on the analyses 
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Fig. 7: AMS-app dashboard at building level with the detailed analysis of office and staff allocation at CLE 
3.5 Space management improvement 


Space management improvement in UniTO pilot case refers to the study of strategies to improve space occupancy, 
allocation, and usage with the introduction of work from home (WFH) practices. The contracts of professors, 
research fellows and PhD students already include flexibility as regards working hours and location, consequently, 
WFH is already allowed. On the other hand, concerning administrative employees, WFH practices have been 
introduced due to the recent COVID-19 pandemic events and then maintained for a total of two days a week. 
However, as of now, no improvement in space management has been introduced related to flexible work scheduling. 


This application of the UniTO AMS-app aims at improving space management by hypothesizing a homogenous 
distribution of WFH days of the employees/researchers of the same office over the working week. As a 
consequence, the maximum occupancy of a single office can be increased. All the occupants are never present in 
the office at the same time, while WFH days are planned so that in each day of the week a certain number of office 
occupants work at home. As a results, the overall occupancy in a building can be increased. Considering two days 
a week of WFH for each worker in a five-day working week, the overall occupancy of the building increases by 
around 60% (Figure 8). This ensures that in the case of recruitment, workstations are already available in the 
existing facilities, without the need of acquiring or renting new spaces. 
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Fig. 8: AMS-app dashboard analyzing space allocation and current number of occupants, the maximum capacity 
of each space, and the maximum one with WFH and strategies to improve CLE space usage. 
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3.6 Indoor environmental quality monitoring and optimization 


Monitoring data related to environmental quality data monitoring within educational spaces is of utmost 
importance. It serves two primary purposes: ensuring a healthy learning environment for conducting lessons and 
detecting any anomalies that may indicate system faults or the need for system remodeling. This approach also 
contributes to energy conservation and the reduction of heating and cooling systems' emissions. Over the past two 
years, several UniTO’s buildings have been equipped with IEQ sensors. Among these, the CLE buildings serve as 
an excellent testing ground for evaluating the effectiveness of these systems due to the high number of sensors and 
the presence of classrooms with different capacities and orientations. UniTO has chosen IoT devices from 
Aircare® that can capture 15 types of measurements, including air quality, environmental comfort, and electro- 
smog indicators. Notably, the devices' accuracy and reliability have been scientifically validated by the Italian 
Society of Environmental Medicine (SIMA) for PM2.5 and CO2 measurements. A total of 39 IoT devices were 
strategically installed across 37 classrooms within the CLE, focusing on the ground and first floors, dedicated to 
teaching activities. Once installed, the data generated by these devices was directly streamed to a cloud platform, 
facilitating data collection. The collected data are accessible through reports, providing valuable information via 
an experimental platform managed by the ICT directorate, and at this stage data aren't shared publicly or with other 
directorates. 


The objective was to optimize the AMS-app potential by linking the data to specific spaces and creating 
interactive dashboards to display real-time data from the continuously flowing information from the IoT devices. 
To maximize the advantages of measuring multiple types of data with a single IoT device, it was decided to 
associate viewable spaces with data related to CO2, CO2e, VOC, PM10, and PM2.5. A dashboard was developed 
to integrate the data with floor plans, clearly indicating the locations of installed sensors in the respective 
classrooms. By selecting a specific classroom, users can readily access and visualize the recorded values 
throughout the week (Figure 9). 
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Fig. 9: AMS-app dashboard at room level allowing the visualization of spaces and data acquired from sensors. 


To serve as alert indicators, specific limit values were established for the main parameters. During the heating 
season, temperature values between 20 and 22 degrees Celsius were selected, while for the cooling season, the 
range was set between 22 and 24 degrees Celsius. The CO2 concentration threshold was defined at 1000 parts per 
million (ppm). As for particulate matter, the limit values suggested by the World Health Organization were adopted: 
below 25 g/m? for PMjo and below 50 ug/m3 for PM) 5. Furthermore, for volatile organic compounds (VOCs), 
the limit of 550 parts per billion (ppb) was chosen, following the guidelines of the US Environmental Protection 
Agency (EPA). This solution offers various advantages: firstly, it allows immediate visualization of the data 
collected by IoT devices and their correlation with specific spaces; secondly, it facilitates the analysis of recorded 
anomalies in relation to occupancy and class schedules of those spaces. This comprehensive approach provides 
valuable insights for maintaining a healthy and optimized teaching environment. 
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4. CONCLUSIONS AND FURTHER DEVELOPMENTS 


The paper illustrated the research conducted under the umbrella of the UniTO's strategic plan, aimed at developing 
an AMS-app to optimize the management of one of the largest university assets in Italy. The replicable 
methodology developed for the definition of the app and its implementation in the organizational system was 
illustrated. The BIM-GIS-BI integration enabled to systemize the various information currently siloed managed, 
providing an integrated decision-support tool accessible starting from the territorial level to the single building and 
component. Through the illustration of several case studies implemented so far on selected buildings, the potential 
of this management system was illustrated, also highlighting the difficulties encountered and the margins for 
improvement. Such a system is easily replicable and showed true potential in being able to manage the available 
resources optimally and consciously. Both in terms of space and economics, it also allows for the optimized 
management of facility services and improved IEQ performance for the end-user experience, leading to effective 
and sustainable management. In the future, information protocols and guidelines will be developed for the proper 
adoption of such an AMS and the correct definition of the IRs. Furthermore, it is envisaged that the management 
processes currently underway will be reviewed in consultation with the heads of the technical-administrative areas 
aiming at the optimal adoption of the system, as well as the implementation of a data-sharing system exploiting a 
data lake from which to extrapolate the targeted information, identified based on the management processes and 
the defined IRs. 
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ABSTRACT: Complex infrastructures such as railway networks face increasing challenges related to resource 
allocation, external events, constraints, and demands. Therefore, it is crucial to optimize the Asset Management 
(AM) phase to ensure the value and functionality of the assets. The integration of Building Information Modelling 
(BIM) and Geographic Information Systems (GIS) can support this phase, but it can only yield benefits with a 
comprehensive approach that considers and addresses the specific needs and resources of the assets and their AM 
organization. The main benefits include improved data management, manipulation, information visualization and 
optimized resource allocation. This study describes an intermediate step towards developing a BIM/GIS 
integration framework for AM that can guide both researchers and practitioners. The framework aims to bridge 
theory and practice by incorporating insights from literature reviews and case studies. Its main objectives are to 
provide a comprehensive multi-stakeholder view and methods for effectively integrating BIM and GIS in this 
context. To develop the framework, the study employed focus groups, interviews, and practical BIM/GIS tests, 
which provided insights reported in this article. Furthermore, the study provides research directions for effective 
BIM/GIS integration in infrastructure AM. 


KEYWORDS: Building Information Modeling, BIM, Geographic Information Systems, GIS, BIM/GIS integration, 
Asset Management, Railway 


1. INTRODUCTION 


Railway networks, like other complex infrastructures, are affected by manifold challenges. Given their significance 
for societal improvement, they must provide increasingly high-quality services (Famurewa et al., 2015) while 
coping with external factors such as extreme weather and resource management (Garmabaki et al., 2021). 
Moreover, railway networks function as intricate systems, requiring adoption of complexity-based approaches in 
order to achieve effective management (Oughton et al., 2018). The improvement of the tools, processes and 
information management during the Asset Management (AM) phase and the Operation & Maintenance (O&M) 
phase is a key factor to address these issues and to implement an effective management of railway networks. O&M 
represents one element of the broader concept of (AM), which is defined as "the coordinated activities of an 
organization aimed at generating value from assets" (ISO, 2014). The O&M phase, being focused on the 
operational and maintenance aspects of the asset is commonly considered as a part of the whole AM phase, in 
which also strategical and tactical decisions about the owned assets are addressed (e.g., investments, risk 
management etc.). In particular, for infrastructure such as railways, a systematic approach is required in order to 
properly manage the assets and to avoid resource waste which would affect the benefits provided to society 
(Almeida et al., 2022). According to the National Institute of Standards and Technology (NIST), approximately 
60% of the total life costs of built assets are accounted for in the O&M phase due to inadequate interoperability, 
leading to considerable wasted resources on information retrieval and poor data management (Gallaher et al., 2004). 


As a technical solution, the integration of Building Information Modeling (BIM) and Geographic Information 
Systems (GIS) has been widely addressed both in literature and in practice. BIM is a widely adopted methodology 
that encompasses the entire AECO/AM sector (Architecture, Engineering, Construction, Operation, and Asset 
Management) and the complete life cycle of a built asset. BIM aims to promote collaborative processes and prevent 
information loss between phases, such as from construction to the AM phase. By means of parametric 3D models 
and standardized workflows and information exchanges, BIM allows to implement digital built environment asset 
management (Re Cecconi et al., 2017). BIM aims to address cost reduction and optimize related tools and 
processes. However, for effective BIM adoption in AM, asset owners and managers (in the role of appointing party 
as defined by ISO 19650) need to carefully assess which information and BIM uses are required. This process 
involves the definition of several requirements, as specified in ISO 19650, such as OIR, AIR, EIR, and PIR 
(Organizational, Asset, Exchange, and Project Information Requirements) (BS EN, 2019). In the context of AM, 
OIR and AIR serve as the primary sources of requirements for delivering the AIM (Asset Information Model), 
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derived from the PIM (Project Information Model). For most AM processes, 3D geometries become less relevant, 
while non-geometric data related to the asset, e.g., warranties, installation dates, etc. are more important. 
Furthermore, organizations managing infrastructures deal with diverse assets, including both punctual buildings 
and horizontal infrastructures like railways, roads, and pipelines. Infrastructural AM can benefit especially from 
BIM/GIS integration, due to the specific need of multi-scale approaches (Breunig et al., 2017). In fact, while BIM 
may provide detailed data about the asset itself, GIS complements it by representing data at larger scales. The aim 
of this research is to investigate and address the needs of railway network management through business-oriented 
BIM/GIS integration for AM. To link literature with practice, the final goal of the entire research is to provide a 
framework based both on current theories and the findings from case studies. The construction of the framework 
was guided by the following research questions: 


Which is the current status of BIM and GIS implementation by organizations in charge of AM of the 
railway network? 


Which are the main benefits and hindrances of BIM/GIS integration for the AM of complex 
infrastructures such as railway networks? 


How BIM/GIS integration should be implemented according to the business core of the organization in 
charge of AM of railway networks? 


At the current state of the research, results from an Italian case study are presented. The subject of the case study 
is RFI (Rete Ferroviaria Italiana), a large public company responsible for managing the railway network in Italy. 
The case study was conducted through Focus Groups and semi-structured interviews. Subsequently, practical tests 
of both BIM and GIS software have been performed and discussed in order to highlight theoretical and practical 
implications of BIM/GIS integration for AM in the railway context. 


1.1 Background 
1.1.1 BUIM/GIS integration 


BIM/GIS integration is a topic that has been deeply investigated in recent decades due to its acknowledged multi- 
purpose potential. A key point of the topic is that methodologies for integration may vary significantly, occurring 
at different levels and with different tools (X. Liu et al., 2017). Moreover, BIM/GIS integration is affected by 
several issues and challenges at the geometric and semantic levels. Several methods, frameworks, and software 
prototypes have been proposed for different applications, such as flood damage assessment (Amirebrahimi et al., 
2015), web-based bridge management (J Zhu et al., 2020), infrastructure asset management (Garramone et al., 
2020), etc. In terms of semantics, a promising approach found in literature is the adoption of semantic web 
technologies, ontologies, and Building Linked Data (Pauwels et al., 2017). Liu et al. (X. Liu et al., 2017) proposed 
a ranking of the several BIM/GIS integration methods classified by EEEF criteria, namely Effectiveness, 
Extensibility, Effort, and Flexibility. Addressing these criteria is crucial because the choice of a BIM/GIS 
integration path depends on the needs of the specific case and context. According to these criteria, semantic web 
technologies have been ranked with a “high” score in Effectiveness and Extensibility, but also a “high” amount of 
effort required for the implementation. These criteria imply a cost/benefit analysis which is necessary for effective 
BIM/GIS integration. Linked to this matter, another recurrent trend found in literature is the almost forced adoption 
of commercial software for effective BIM/GIS integration. In fact, the adoption of ArcGIS PRO is recurrent, along 
with the one-directional approach “BIM to GIS” for data integration (Ma & Ren, 2017). Regarding the complex 
conversion of BIM to GIS files, FME software is also a solution frequently found in the literature (Junxiang Zhu 
et al., 2019). However, important efforts found in literature foster open-source approaches and tools (Jiang et al., 
2019), because they may provide support to address the increasing complexity of projects, the need for better 
interoperability and the need to mitigate costs. Among relevant open-source tools, Cesium is an open platform for 
3D geospatial data that may implement a 3D BIM/GIS environment (F. Liu et al., 2020), as long as BIM models 
are converted to other formats such as .gltf or .obj. The literature shows that BIM/GIS integration is a complex 
and multifaceted topic, which requires an in-depth contextual analysis. For this reason, this research attempts to 
contribute by providing a framework based on knowledge obtained not only from literature but also from specific 
case studies. Besides the technical challenges, BIM/GIS integration is also an organizational cultural and 
competence shift, thus it should be addressed according to the specific needs of companies and involved 
stakeholders. 
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1.1.2 Asset Management and BIM/GIS integration for infrastructures 


When compared to previous phases such as design and construction, AM and the O&M phase are affected by 
peculiar theoretical and practical gaps when related to BIM. One of the reasons is that the object-oriented paradigm 
and the parametric approach provided by BIM authoring software tools are less straightforward to utilize in the 
context of AM. On the other hand, given that AM is facilitated by IT systems, leveraging BIM for automated 
information exchanges holds considerable promise and potential benefits. Furthermore, the primary standard for 
AM, namely the ISO 5500x collection (ISO, 2014), does not directly address BIM methodology. Instead, it relies 
on ISO 19650-3 (BS EN, 2019) as the main reference source. In comparison to GIS, BIM is relatively recent and 
lacks shared and well-standardized paths for AM. Several factors contribute to this situation. Firstly, AM suffers a 
lack of a structured framework of BIM standards and tools (Munir et al., 2019). Secondly, the IFC (Industry 
Foundation Classes) data model has only recently been updated to consistently represent railways with the IFC 4.3 
schema release (buildingSMART, 2022). Lastly, the specification of OIR and AIR poses challenges for asset 
management companies due to unclear role of BIM in supporting their core activities (Hadjidemetriou et al., 2023). 
The conjunction of these factors hinders BIM or BIM/GIS adoption, with the risk to implement an ineffective 
change management from traditional to BIM-based AM (Jupp & Awad, 2017) thus nullifying the benefits of BIM 
adoption and resources invested (Dixit et al., 2019). 


Despite the challenges of BIM/GIS integration, the literature still agrees on its need and expected benefits. For 
instance, BIM-based information exchange and storage standards may ease information retrieval and management, 
meanwhile GIS may provide analysis tool for the whole asset portfolio and its relation with environment and 
surroundings (Wang et al., 2019). However, fully unlocking the potential of integrating BIM/GIS for infrastructure 
AM requires a more in-depth investigation across strategic, tactical, and operational levels. The entire potential of 
BIM/GIS integration for infrastructure AM needs to be further explored at these levels (Garramone et al., 2020). 
Existing literature and available tools illustrate a promising scenario for achieving and effectively implementing 
BIM/GIS integration. To the best knowledge of the authors, in the current literature, organizations' awareness of 
possible benefits given by BIM-based AM approaches and solutions is not sufficiently considered, especially in 
the specific context of railway networks. To address this gap, this research aims to offer insights from an 
organizational perspective while conducting technical evaluations of both commercial and open-source 
alternatives. 


2. MATERIAL AND METHODS 


To answer the aforementioned research questions, a broader research has been undertaken as a multi-step process, 
of which a brief overview is provided in Figure 1. 


Further advancements in next step of 
the research 


oe So 
a 
STEP 1 STEP? 
Literature review 
Cotine research ‘about BIM/GIS 
qa integration 
L i J 


u J 
r 


Prewous step of the research Steps addressed in this article 


Figure 1 Overview of the multi-step research. 


The work presented in this article follows a systematic literature review (SLR) concerning BIM/GIS integration 
(Mangia et al., 2022). Building upon the findings from this initial step, the focus was subsequently narrowed down 
to a specific life-cycle phase and asset class, namely the AM phase and transport infrastructures. Following this, 
two case studies (involving RFI and Trafikverket, respectively) have been conducted to answer the research 
questions. In this work the RFI case study is presented, and case study methodology is defined as an in-depth 
investigation of a particular subject, such as a group, organization or phenomenon in a real-life context (Crowe et 
al., 2011). The case studies have been addressed by means of four main activities: 


1. Data collection by means of semi-structured interviews (SSIs) and focus-group; 
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2. Data processing; 
3. Tests and experiments with several BIM and GIS-based software; 
4. Evaluations of the key elements of a framework for business-oriented BIM/GIS integration. 
Data collected in activity 1 are mainly related to the following two topics: 
Existing AM, GIS and/or BIM systems employed by the company; 


Awareness of benefits obtainable from BIM and BIM/GIS integration for the business core of the 
company. 


Activity 1 was carried out across multiple sessions. The Focus Group method was selected as it facilitated the 
involvement of RFI departments interested in BIM/GIS integration and allowed confirmation of the authors' 
hypothesis: "BIM is not yet a well-established tool adopted in the core business and it lacks a standardized 
integration approach with existing systems." The focus groups engaged personnel from various RFI departments 
that could potentially be impacted by BIM/GIS integration, such as AM/ERP system users and administrators, GIS 
users, and others (Table 1). The researchers, acting as focus group facilitators, were able to provide a common and 
shared understanding of BIM/GIS integration opportunities and limitations and to receive feedbacks from different 
perspectives. 


Table 1 List of participants of focus groups and interviews. 


Department Executive Maintenance InRete.2000 MUIF support Asset 
Manager management system support Management 
Interviewees N°l N°2, N°3, N°4 N°5, N°6 N°8, N°9 N°10, N° 11 


In addition to this, semi-structured interviews were conducted with each business unit to delve deeper into the 
investigation and to pose specific technical and organizational inquiries to each interviewee. The “Data processing” 
activity involved the analysis of the information retrieved from the Focus Groups, interviews and related 
documentation provided. Knowledge about company-level standards, demonstrations of existing systems and 
datasets were provided for processing. This led the authors to the “Tests and experiments” activity, in which a 
series of exploratory experiments with several BIM and GIS software and tools were performed. The objective of 
these tests was to identify and assess a list of “key-elements” which the framework should address (i.e., the fourth 
activity of this research). For the tests and experiments, QGIS was employed for inspecting and extracting data 
from the geodatabase provided by RFI. Autodesk Revit and Bentley OpenRail Designer were used as BIM 
authoring tools to create simple 3D models of different types of assets (such as buildings, railway tracks, and 
sidewalks). Autodesk InfraWorks was utilized to present a 3D BIM/GIS environment for Proof of Concept (POC) 
purposes. The IFCjs and ifcopenshell software libraries are currently under test in order to extract data from IFC 
BIM models and evaluate web-based BIM/GIS viewers. Taking into consideration the outcomes of focus groups, 
interviews, and software tests, the final step of the broader research will concentrate on developing the framework 
for BIM/GIS integration for AM. 


3. RESULTS 


In this section the results obtained in the scope of the Step 3 and 4 reported in Figure | are reported. These results 
provided the conceptual and practical foundations which drive the ongoing development of the framework (i.e., 
Step 5) discussed in Section 4. 


3.1 BIM potential for existing systems 


One of the pivotal results for the development of the framework is the identification and analysis of existing 
systems. This information is one of the two main results retrieved from the Data Collection and the Data Processing 
activities. The second main result is addressed ins sub-section 3.2. RFI adopts two primary information systems 
for AM that have potential for integration with BIM. The authors were presented with comprehensive 
demonstrations of these systems during the interviews. The examination of the systems currently utilized by RFI 
supplied essential insights for assessing BIM/GIS integration options and addressing the initial research question. 
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A schematic overview of the two main systems is provided in Table 2. 


Table 2 Overview of the two RFI system investigated. 


System Type Data involved Tasks performed 
InRete.2000 ERP SeTe’s data model, master data - Translation of infrastructure projets into a railway 
sheets network model composed of locations and routes. 


- Censorship the railway network assets by means of a 


compiled master data sheet (e.g., train stations, railways). 


- Management and maintenance tasks of every asset of RFI 
(e.g., asset is in function or surpressed, failure 


management etc.). 


MUIF WebGIS Geodatabase consisting of 2D - Context and asset visualization at the macro-, meso- and 
GIS layers, DTMs micro-scale. 
photospheres and 3D - GIS spatial analysis (e.g., buffer zones). 


pointclouds. batts Soho ; : 
- Bi-directional linkage with other RFI systems for AM and 


O&M (e.g., route interruption). 


The first system is InRete.2000, a customized version of SAP AM software. It is an Enterprise Resource Planning 
(ERP) software which supports the management and maintenance of the railway network infrastructure. Based on 
RFI data model, assets managed with InRete.2000 are represented by means of two entities, namely called “Sede 
Tecnica” (SeTe) and “Equipments”. SeTe entities serve to represent spatial structures or components that require 
maintenance, such as train stations and tracks. "Equipments" refer to physical objects installed within SeTes. Each 
SeTe and Equipment is assigned an ID within InRete.2000, referred to as the "Code of Sede Tecnica," which 
establishes semantics and hierarchy among the assets. Information within a SeTe is populated through on-site 
surveys, manual checks, and operator input. A SeTe is composed of sets of data and metadata, such as its location, 
working status, maintenance activities etc. In InRete.2000, a SeTe’s record acts as a master data sheet for the 
respective entity. The hierarchical decomposition of SeTes mirrors the network model adopted by RFI. In particular, 
the railway network (which is a SeTe of first level) is characterized by two main elements: “Localita” (Locations, 
code LO0000) which constitutes the nodes and “Tratte”? (Routes TR0000) as the edges of the network. 


Presently, InRete.2000 lacks geometrical and geographical visualization for SeTes, which is instead provided by 
MUIF (Modello Unico dell'Infrastruttura — Unified Infrastructural Model) in the form of a web-GIS application. 
MUIF is a long-term project initiated to establish a common information system supporting the business logic of 
each department of RFI. MUIF encompasses information about all assets within the rail network managed by RFI, 
facilitating the tracking of related data, visual representation of asset physical aspects, and verification of their 
geographic locations. The geodatabase predominantly comprises shapefiles, Digital Terrain Models (DTM), 
photogrammetric data sources like point clouds and orthophotos. Both InRete.2000 and MUIF are firmly 
established as essential tools for RFI, supporting their core business functions. These systems enable activities 
such as failure management, maintenance orders, route interruptions, and more. The former is employed for the 
management and maintenance of the railway network, and at the current state it is bi-directionally linked with 
MUIF by means of the hierarchical ID named “Code of Sede Tecnica”. In addition, MUIF users may inspect a part 
of the railway network by means of photo-spheres and point clouds as shown in Figure 2. However, the 2D maps 
and the 3D point clouds are displayed in two distinct frames within a browser page, thus a unified 3D web GIS 
environment is not implemented yet. According to the interviewees, BIM holds potential for integration with 
existing systems, since it could significantly improve several processes such as context inspection and information 
retrieval. With BIM, detailed asset-level 3D models and information could be readily accessible, both for large 
entities as SeTes and for small ones like Equipments, which can be challenging to represent in MUIF despite their 
presence in InRete.2000. Moreover, the hierarchical data model of assets managed can be reflected in BIM 
components with dedicated attributes, which needs to be specified by RFI in its AIR. Working as an ID, these 
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attributes may also partly overcome the needs of semantic which will be provided by the release of the new IFC 
4.3 version. 


Figure 2 Screenshot of MUIF point clouds and photospheres to integrate the 2D GIS environment. 


3.2 BIM and GIS state of the art in the organisation 


The interviewees provided a comprehensive and multi-perspective overview of the current status of BIM and GIS 
within the organization. From the analysis of the Italian case study, as the second main result of the Data Collection 
and Data Processing activities, the researchers identified key concepts that need to be considered for the 
development of the framework: 


BIM: While existing systems have not yet fully integrated BIM, the organization actively participates 
in various BIM pilot case studies and work groups to test and implement BIM led by buildingSMART 
initiatives; 


The existing systems are undergoing continuous strategic development, making disruptive software 
changes impractical. Therefore, BIM should be integrated into the existing systems without severe 
changes to the system architecture; 


Several commercial vendors of AM systems already offer BIM-plugins in the AM environment, 
including SAP; 


Asset management personnel currently lack autonomous access to relevant data and technical drawings 
of assets (e.g., plants, sections), where AIM CDE linkage could provide support; 


BIM data and models can enhance several manual processes, such as InRete.2000 datasheet filling and 
on-site inspections; 


Organizations involved in AM of infrastructures are typically large, and implementing changes and 
processes can be costly and time-consuming; 


Vendor-agnostic approaches for information exchange, like OpenBIM, are vital since these 
organizations will mainly receive BIM-based data in open formats such as .ife or COBie-compliant 
datasets. It also supports BIM/GIS integration thanks to IFC model conversion to 3D GIS formats; 


The prevailing notion regarding BIM models is that they become static data sources stored in the AIM 
CDE after project handover. However, there is potential for dynamic BIM utilization in the AM context, 
involving data management and manipulation tasks. 


To support these conceptual foundations, the authors conducted experiments and tests to gather insights on how 
BIM and GIS data can be effectively managed according to business needs, current system limitations, resources, 
and requirements. 


3.3 Test and experiments on software applications and tools. 


Throughout the design and handover phases, specialized tools are employed to facilitate iterative and extensive 
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data manipulation activities for generating BIM models. The authors sought to explore the applicability of these 
tools for Asset Management (AM) purposes and conducted tests on two software solutions: Autodesk Revit's 
Dynamo plug-in and the "Asset Manager" tool in Bentley OpenRail Designer CE Edition. Both tools enable users 
to carry out batch operations, including property set and properties creation, parameter updates, and more. Dynamo 
adopts a Visual Programming Language (VPL) with a graphical user interface (GUI) to facilitate script 
development, although some level of programming familiarity is still necessary. Conversely, the "Asset Manager" 
tool follows an approach more aligned with traditional AM systems and user experience. It employs pre-structured 
Excel files, allowing users to batch assign property sets and properties to the necessary entities. This tool expedites 
the rapid incorporation of especially pertinent data for integration with InRete.2000, such as the "Sede Tecnica" 
ID and the class code. The tests began with the creation of a BIM model of an actual location using data and 
documentation provided by RFI, which included a geodatabase (.gbd) containing point clouds, 2D shapefiles of 
the asset, digital terrain models (DTM), and orthophotos. Furthermore, RFI guidelines and the class database of 
the “Sede Tecnica” classes were made available. These tasks were conducted in the “BIM model creation” step 
shown in the overall workflow is summarized in Figure 3. Once the BIM models have been developed, the authors 
wanted to employ them both with commercial software (i.e., Autodesk Infraworks) and open-source tools 
(ifcopenshell, IfcJS). In this article, we acknowledge that only the workflow “BIM/GIS viewer POC” is introduced 
and discussed; however, a comprehensive discussion of the “BIM web-based viewer and AM module” workflow 
is intentionally omitted because it is still in development and to avoid an excessive length of the article. 


30 BIM'GIS viewer POC 
30 BIM:GIS 
T environment 
BIM web-based Viewer and Asset Management Module 
¢ 2 + IFC data extraction and editing seriot | 
-> 4 x a E 
Opensource web-Dased BIM viewer } 


Further advancement in next step 


Figure 3 Workflow of the different software and tools tests performed in this study. 


Shapefiles provided the asset footprint and alignment, along with coordinates and InRete.2000 data. These data 
were used to develop the BIM models. Buildings were modeled with Autodesk Revit (2022-2023 version), with 
simple architectural models linked together to provide an overview idea of a set of contiguous assets. From QGIS, 
as shown in Figure 4, the “Info Project” pop-up window is shown with the two most important data, namely the 
ID of the “Sede Tecnica” and the class code of the InRete.2000 data model. To replicate the attributes and values 
of the data from the shapefile attribute table to the BIM models, several paths can be undertaken after the export 
of the table in a spreadsheet. 


Figure 4 Shapefiles in QGIS (left picture) and derived BIM models (right picture.) 


In Figure 4 is also reported a 3D view of the model, with the other developed models attached to check the 
correctness of geolocation data. Data at “asset-level” has been assigned to the “Info Project” entity in Revit. During 
the export to IFC files, information about the main SeTes representing the nodes and the edges of the railway 
network can be associated to the IfcProject or IfcBuilding entity. To handle the large amount of data that should 
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be added to or extracted by BIM elements, batch parameters procedures are an opportunity to save time and reduce 
errors. Shifting to Bentley OpenrailDesign CE, the same process has been performed for railway tracks using the 
Asset Manager pre-structured excel files. These two workflows were shown to the interviewees, for feedback 
collection. Both the aforementioned processes require commercial software, even if an IFC file is used. These 
kinds of tasks are typically performed by designers (usually not an RFI employee). However, the authors are 
confident in the idea that companies like RFI may adopt similar tools for the management of data inside their AIM 
BIM models. In fact, these tools provide semi- or automated procedures for data extraction and manipulation which 
can ease the link of BIM models with InRete.2000 and MUIF. For this reason, the IfeOpenShell and IfcJS library 
are also under test to implement sample scripts for extracting, validate and store data in JSON or CSV format for 
information exchanges with InRete.2000 and MUIF, based on REST API protocols. Compared to commercial 
software, these libraries allow to develop bespoke script for the extraction and manipulation of data from IFC 
models. However, this improved flexibility and interoperability requires dedicated team of developers compared 
to “out of the box” commercial software, and thus companies such as RFI needs to evaluate which alternative best 
suit their needs and capabilities. 


In addition to the criticalities regarding information management and exchanges, interviewees raised an issue 
regarding the limitations of MUIF for precise measurements and inspections despite its ability to visualize 2D GIS 
layers, point clouds, and 360° photos. They expressed the need for a more reliable tool, such as BIM models, to 
facilitate indoor inspections, object data retrieval, and accurate measurements. For this reason, two alternative 
integration pathways were discussed. The "basic" integration involves BIM models being integrated into MUIF 
similar to point clouds and 360° photos, connected to the 2D GIS environment via a URL inserted in an attribute 
field of 2D GIS shapefiles entities. The "basic" integration, therefore, involves a process-level integration where 
BIM and GIS data are not manipulated or converted into each other's formats. However, according to the 
interviewees, this integration can already enhance the aforementioned tasks. Conversely, an "advanced" integration 
would establish a unified environment where 3D BIM and GIS geometries are visualized together. This advanced 
integration could be achieved through open-source tools like CesiumJS or QGIS, as well as commercial software 
options like ArcGIS PRO and Autodesk ESRI GEOBIM. For the purpose of this research, Autodesk InfraWorks, 
as depicted in Figure 5 was selected because it readily provided a proof of concept for a 3D BIM/GIS environment. 
The interviewees conveyed that such an environment would yield significant benefits to their core business tasks, 
providing enhanced context visualization, multi-scale dimensioning and data aggregation at larger scales. However, 
despite recognizing the advantages of the "advanced" approach, the interviewees exhibited a stronger interest in 
the "basic" integration due to its easier implementation. While the "advanced" approach was seen as a desirable 
future goal, the interviewees identified complex change management efforts as the main hindrance to its company- 
wide implementation. 


Figure 5 Infraworks 3D BIM/GIS environment for proof of concept. 


4. DISCUSSION 


Upon the findings of the Focus Group, interviews and experiments, two relevant points of discussion were 
identified. The first one focuses on the technical aspects of the BIM/GIS integration for infrastructure AM, 
discussing two alternatives which could be implemented in the short- or mid-term. The second point is about the 
organizational aspects, which according to the authors should require a deeper investigation in future works. 
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4.1 Pathways of BIM/GIS integration for AM 


For the “Test and experiments” of this study (sub-section 3.3), a limitation is that Semantic Web Technologies 
have not been tested. This is due to the fact that RFI currently relies on relational databases, and transitioning to 
more advanced tools like graph databases and ontologies may present a challenging change management effort 
while a full BIM adoption is still in progress. According to the outcomes of the “Test and experiment” activity, the 
development of custom scripts and software to link the BIM AIM CDE with existing systems could prove 
advantageous for AM activities by reducing error-prone manual data management. The interviewees expressed 
positive feedback regarding several functionalities available within BIM authoring tools, particularly the Visual 
Programming Language (VPL) features of Dynamo. As a result, the authors suggest that a potential AM-specific 
BIM solution could incorporate VPL or similar tools as one of the possible modules to enable guided scripting for 
AM purposes. 


Regarding the second research question, it was found out that the answer hinges on the chosen approach and thus 
it is strongly correlated to the third research questions (i.e., how to integrate BIM and GIS). For these reasons, in 
this work two approaches are considered, named “basic” and “advanced”, yielding distinct outputs in terms of 
benefits and hindrances. The "basic" integration primarily involves linking BIM models and shapefile attributes 
with tailored scripts for data management. This integration allows for converting simple data and geometries from 
BIM models into the GIS system, such as footprints and project-level information. In the MUIF web-environment, 
BIM models can be accessed as a separate frame by clicking on the GIS 2D representation. Despite its simplicity, 
this level of integration already provides benefits for maintenance tasks such as census activities and on-site 
inspections. For asset management and maintenance tasks, non-geometric data are commonly related to the asset 
itself and does not require an intensive integration with territorial data. However, efforts and resources must be 
dedicated to developing solutions for managing non-geometric data and handling sets of properties across multiple 
systems, using the "SeTe ID" as the matching key. Following an OpenBIM approach, a BIM-viewer inside MUIF 
web application can be implemented by means of IFCjs or ifcopenshell libraries avoiding intensive rework inside 
the existing system. 


Conversely, the "advanced" integration unlocks the potential for converting conventional 2D spatial analysis into 
a dynamic 3D realm, seamlessly incorporating attributes like elevations, building stories, BIM components, and 
more within a comprehensive 3D BIM/GIS environment. The primary advantage of this approach lies in the 
amalgamation of asset-specific and territorial data within a single interactive environment, enabling querying and 
management capabilities across diverse assets, such as multiple BIM models of railway bridges. In contrast to the 
"basic" integration, where each BIM model resides within its individual window frame, the "advanced" approach 
facilitates asset-specific data analysis on multiple models coherently. Moreover, 3D BIM/GIS visualization 
contributes to increased awareness of the impact of the asset in the environment compared to its footprint on a 2D 
GIS map. Thus, the “advanced” approach allows for a comprehensive 3D model of the railway network assets 
alongside other assets (e.g., from third party sources) and digital terrains. However, its implementation is more 
complex and costly, especially with open-source approaches, due to the technical pipelines and workflows required 
to utilize BIM data in a 3D GIS environment. There exists a clear trade-off in benefits between the "basic" and 
"advanced" integration. While 3D BIM/GIS models enable 3D spatial and data analysis, achieving it demands 
intensive efforts with open-source solutions or the adoption of commercial software. In perspective of the 
development of the framework, a “integration layer” should be conceptualized to highlight that the choice of a 
BIM/GIS integration approach directly influence the “business layer”. 


It is also important to consider the recent advancements of the latest version of the IFC schema, i.e., IFC 4.3. Even 
if crucial for semantic interoperability between BIM-based software, it is worth noting that in the AM context the 
link with other systems may be driven by specifying an attribute which allows companies to implement its 
company-level data model or classification systems. In the case study performed with RFI, the “Code of Sede 
Tecnica” attribute implemented in the BIM model act as a global ID of the asset throughout the other systems such 
as MUIF and InRete.2000. This approach can ease both the implementation of the “basic” and “advanced” 
integration, since it enables a certain degree of interoperability even if several elements in the BIM modeles needs 
to be exported in IFC files as IfcBuildingElementProxy entities. 


4.2 BIM/GIS integration insights for framework development 


Given the expansive nature of BIM/GIS integration for AM, it is essential to approach it at the organizational level 
as a systematic, step-by-step process, delineated into "modules" or "key functions". This segmentation allows for 
the assessment of enabling technologies, prioritization, benefits, challenges, and other pertinent aspects for each 
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module. For instance, one module could pertain to the "BIM/GIS viewer," evaluating its necessity and applications. 
In this context, the aforementioned "basic" integration would enable swift asset location on GIS, while inspections 
would be conducted using a BIM-only viewer. On the other hand, assessing how a BIM model of a railway route 
interacts with train stations necessitates a 3D BIM/GIS viewer, as envisaged in the "advanced" integration. Another 
illustrative module could be "AM data analysis," aiming to empower BIM/GIS-based business intelligence and 
data analysis tools. The adoption of this modular approach mandates the formulation of a well-considered change 
management strategy that aligns with existing information systems, processes, and staff competencies. Without 
such a strategy, companies might choose counterproductive BIM/GIS integration solutions. While advanced 
solutions may seem preferable, adopting a modular mindset enables companies to opt for a cost-effective 
"BIM/GIS viewer" module while concentrating greater resources on other modules. Therefore, concerning the 
third research question, organizations should strive to associate the benefits of modules with particular tasks, such 
as utilizing a 3D BIM viewer for asset-wise measurements or employing a 3D BIM/GIS viewer for context-wise 
measurements. The framework currently in development not only aims to provide a module-based view of the 
problem, but it is also linked to the primary concerns and needs emerged from the semi-structured interviews and 
Focus Group outlined in the sub-section 3.2. It’s worth noting that both the “basic” and “advanced” does not serve 
as substitutes for MUIF or InRete.2000. Furthermore, modularity enables changes to be implemented 
incrementally. For example, if only a BIM-based data exchange for InRete.2000 is required, it can be developed 
without the need for investment in a BIM/GIS viewer. However, it is important to emphasize that these assumptions 
hold true if there is a comprehensive understanding of the existing systems employed, as they will inevitably 
impact the effectiveness and significance of BIM/GIS-based modules and tools in relation to business activities 
and objectives. 


As a future work, the authors aim to embed this concept in the development of the framework, highlighting this 
“modularity by design” approach for the specific case of RFI as a novel contribution to the current body of 
knowledge. This approach is intended to provide the framework with a certain degree of generalization, since it is 
also meant to be a replicable tool for asset owners responsible for other kinds of infrastructures. In the AM phase, 
BIM is addressed by means of AIM. Since they are considered the backbone of Digital Twins (DT) (Lu et al., 
2021), the framework is also intended to be extendable with modules regarding other technologies and tools such 
as Internet of Things (IoT), Machine Learning (ML) and Virtual/Augmented Reality (VR/AR) (Johansson & 
Roupé, 2022) which could uplift AM activities. However, these strides necessitate preliminary steps, and BIM/GIS 
integration is among the most intricate. Compared to previous researches, the ongoing work presented in this 
article aims to provide a connection point between the advanced proposals found in literature (e.g., semantic web 
technologies, brand new BIM/GIS systems) and the short- and mid-term needs of organizations involved with AM. 
Hence, this work addresses the second research question by offering guidance and assessment on implementing 
changes within existing systems. As a theoretical implication, this research aims to contribute providing two 
research directions. First, a BIM/GIS integration approach specific for AM should take in consideration the 
feasibility and the concept of “modularity” and to “innovate with the lowest degree of changes required to the 
overall existing system architecture". The second research direction is related to the analysis of specific BIM 
requirements for AM software features and the definition of core skills and needs of a “AM-BIM specialist”. 
Unlike prior phases, the escalating significance of non-geometric data, the pivotal role of open non-proprietary 
formats, and the dynamic nature of working with AIMs highlight the need for a professional role currently 
undefined. Furthermore, while BIM authoring tools in earlier phases evolved naturally from preceding tools (e.g., 
AutoCAD), BIM/GIS-based AIM software poses challenges as it requires integration into existing AM systems. 
In light of this, the authors recommend investigations into change management for AM-specific BIM/GIS 
integration, spanning tool prerequisites and professional roles encompassing competencies, core skills, and 
training. 


5. CONCLUSION 


The work presented in this paper marks an intermediate stage within an ongoing research endeavour focused on 
the development of a BIM/GIS integration framework for efficient assets management in the railway context. 
Starting from this, the framework will be enriched by insights derived from literature, two case studies, and 
experiments with both open-source and commercial software. Tests involved the creation of simple 3D BIM 
models from existing data sources, batch data manipulation and BIM/GIS representation alternatives. As a future 
work, the authors plan to extend the applicability of the framework to complex infrastructures beyond railways. 
One limitation of this current work is its reliance on the perspective of an Italian case study; thus, a future Swedish 
case study is being developed to enable a comparison and generalize the framework's applicability. Another 
limitation lies in the exclusion of widely discussed software like ArcGIS PRO and FME. Instead, Autodesk 
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Infraworks was selected for the purpose of the proof of concept. Additionally, the company's unfamiliarity with 
semantic web technologies restricted the inclusion of this technology in the study. 


The results of this work are geared towards contributing to two distinct research directions. Firstly, pertaining to 
the technical facets of BIM/GIS integration for infrastructure asset management, the suggestion is to explore 
alternatives that can be readily comprehended and implemented by companies. This implies that BIM-based 
solutions should be approached more as adaptive tools designed to seamlessly integrate with existing GIS and AM 
systems. This is preferable to introducing disruptive, entirely new systems that would necessitate significant 
investments and comprehensive system overhauls. The second research direction focuses on organizational aspects. 
Change management emerges as a pivotal factor in BIM/GIS integration and should be closely aligned with the 
operational, tactical, and strategic levels of asset management. Several tools and procedures can be tailored from 
BIM software employed in earlier phases to benefit asset management. However, this adaptation necessitates a 
profound comprehension of the organization and may warrant the establishment of novel professional roles, such 
as AM BIM specialists. The final goal of this work is to contribute to the body of knowledge by addressing this 
multidimensional problem suggesting “modularity” as a key concept for BIM/GIS integration-based AM 
frameworks. Regarding this, future works in this research will address the complexity of the technical alternatives 
declined with the capabilities (i.e., resources, staff training, tools etc) of organizations. It is important to note that, 
without effective change management, companies might encounter counterproductive BIM/GIS integration efforts, 
which could substantially impede investments intended to enhance the quality and functionality of critical and 
complex infrastructures, such as railway networks. 
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ABSTRACT: Enterprise resource planning (ERP) is an integrated business management system aimed at 
monitoring and maximizing resources and efficiency; on the other hand, Building Information Modelling (BIM) 
represents a broad series of approaches to design, based on the development of virtual models that cover the 
building s whole lifecycle. The integration of ERP systems within the Architecture, Engineering and Construction 
industry, while promising, has yet to reach the same results that its use has achieved in other fields. Although BIM 
and ERP are traditionally systems employed in different disciplines, they both deal with data integration and 
customization, and are designed to reconcile varied and scattered information. A mutual incorporation could allow 
for a more comprehensive understanding of the project starting from the initial phases, while also granting a more 
streamlined construction process and a reduction in errors and complications later on. The aim of this paper is to 
identify the possible connections between the two systems examining a case study, starting from an analysis of the 
current state of the art regarding this implementation, and by evaluating both the existing limits and the future 
possibilities of this implementation, for both small and medium enterprises (SMEs) and the industry at large. 


KEYWORDS: Enterprise Resource Planning (ERP), Building Information Modelling (BIM), Small Medium 
Enterprises (SMEs), AEC, Construction, Data integration 


1. MANAGEMENT INFORMATION SYSTEM AND ENTERPRISE RESOURCE 
PLANNING 


1.1 Brief history of ERP 


Enterprise Resource Planning is a type of business information system meant to manage and monitor all the 
resources within an organization. Introduced in the 60s within the manufacturing industry as Material Requirement 
Planning (MRP) (Lee, Arif, & Halpin, 2002), the system mainly focused on the storage and allocation of materials. 
In 1975 IMB then developed the Manufacturing Management and Account System (MMAS), now considered the 
true precursor to ERP (Jacobs, 2007): the system was aimed at creating ledger postings, job costings and related 
forecasting updates based on the inventory status and order transactions, while also generating orders using a 
standard Bill Of Material (BOM). 


The consequent technological growth enabled, during the 80s, the development of the Manufacturing Resource 
Planning (MRP II) system, which allowed the integration of functions related to human resources and marketing, 
as well as the management of all financial and accounting information (Kumar, & Van Hillegersberg, 2000). The 
ability to manage and update orders, inventory and financial transactions in a single system, allowed the companies 
to replace and centralize multiple typically stand-alone systems with a single software, resulting in easier 
communication between different areas and the automatization of data sharing. 


However, the term ERP wasn’t coined until 1990 by Gartner (Katuu, S. 2020), in order to describe a new generation 
of unified systems that could manage different departments within the same company, and that were based on the 
standardization and integration of processes within a single, shared database. This new type of management 
software also dealt with back-office data, financial transaction, marketing, and all the information related to all the 
different production stages, from planning and procurement to transportation and delivery. A rapidly growing 
business, ERP sales crossed the $10 billion mark in 1998 (Shi, & Halpin, 2003), with companies like J.D. Edward, 
Oracle, PeopleSoft, Baan and SAP surpassing IBM and positioning themselves as leaders in the ERP market. A 
critical factor in the success and spread of the software was also the so called Millennial Bug, or Y2K Problem, 
which was thought to cause global electronic damage due to how programs used to format year dates; in order to 
prevent possible informatic problems at the switch of the millennium, many companies took advantage of the 
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necessity to update their own software and implemented ERP systems, cementing the software’s consolidation in 
the industry. 


Despite a promising start and initial accounts reporting a Return On Investment ranging from 30 to 300% within 
a year after installation (Shi, & Halpin, 2003), many failures were also accounted: the main problems were mostly 
related to the initial costs of purchase and installation - which, at the time, varied from $2 million to $130 million 
(Ross, 1999), and the time needed for full implementation, as well as the likelihood of a performance dip during 
the stabilization phase of the software, which was expected to typically lasts for four to 12 months. By 2002 all 
main ERP distributors faced a significant stock plummet: Baan had fallen off by then, and J.D. Edwards and 
PeopleSoft opted for a merger between the two companies, since there was very little overlap between what the 
two software offered and the joint venture was thought to, in this way, manage to exceed SAP and Oracle (Jacobs, 
2007); nevertheless, Oracle itself took over the merger after a few days with an hostile takeover, leaving itself and 
SAP as the main distributors. 


In the meantime, a new system was being developed: Extended ERP, or ERP II, not only allowed for real time 
access to information and immediate data sharing between members of the company, but also had managed to 
integrate two other software, Supply Chain Management (SCM) and Customer Relationship Manager (CRM), 
which could manage interactions with suppliers regarding procurement and transportation, while also handling 
clients’ data (Al-Amin, Hossain, Islam, & Biwas, 2023). The advent of the digital era, paired with the difficulty 
for Small and Medium Enterprises (SMEs) to bear the software’s implementation and managing costs, resulted in 
the development of Cloud-based ERP systems: in this case, the system runs on the provider’s cloud platform as 
opposed to an on-premises hardware that needs to be handled by a IT team employed by the company. Traditional 
ERP software, developed for an easier but less flexible management than what the market requires today, have 
been proven to be insufficient in handling the complex internal processes required within an enterprise, as they are 
not able to grant the same streamlined integration that a cloud system can offer. 


Many possibilities are viable today considering the advancements of technologies within the 4.0 Industry, one of 
which could be the development of ERP Mobile platforms, accessible from smartphones through internet 
connection, and that could allow for easier and instant access to information and more customization possibilities. 
The introduction of AI and Machine Learning could also quicken processes, thanks to the prediction of inventory 
status and the atomization of repetitive processes, as well as error predictability; the integration of IoT solutions, 
such as smart sensors, could, too, facilitate the monitoring of materials and equipment within the supply chain. 


1.2 ERP within the construction industry 
1.2.1 State of the art 


ERP systems are currently used by construction companies in order to: 


Improve customer response time; 
Improve relationship with suppliers while also strengthening the supply chain all together; 
Increase organizing flexibility; 


Reduce time related to decision making processes, as well as the competition times and related costs. 


Considering the great level of customization that this kind of management system can offer, it might seem 
impossible to define a standard ERP software for an industry like the AEC (Architectural, Engineering and 
Construction) one, which is incredibly fragmented and characterized by different areas of work, each one of which 
has its own particular needs, usually related to the specific project currently being worked on. As Helo and Szekely 
(2005) mention, many are, in fact, the benefits that an ERP system can provide, such as the possibility to 
immediately develop a Master Production Schedule which allows sales orders and forecasts, the creation of 
purchasing orders for suppliers and production orders for plants, the constant update of inventory statuses updated 
depending on procurement and delivery status, and tracking of financial records of both the customers’ orders and 
the company’s internal status regarding payroll and suppliers payments. Integrating an ERP system inside the 
company can help manage all this information within one single centralized software, allowing for a more 
streamlined sharing of data, which can prevent errors or duplicated records during the various stages. This implies 
a sort of standardization of processes that, while it can result in more transparency and rapidity during the processes 
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and, all around, a general improvement of performance, clashes with the nature of the construction industry. 


Multiple attempts have been made in order to push the industry towards the same levels of efficiency that this kind 
of implementation brought to other branches, while failing to get the same results (Gavali, & Halder, 2020): the 
outcomes achieved by sectors that rely on standardized mass production such as the production and manufacturing 
industry — industries ERP systems where developed by and for (Kumar, & Van Hillegersberg, 2000) — seem to be 
unrealistic for construction companies, which work instead on a project by project basis and where every case is 
characterized by specific needs related to not only the requests of the client, but also are dependent on the different 
and multiple teams working on the project itself, which vary often and might end up working together for the first 
time. As Yang et al (2007) noted, the Construction Managements packages offered by commercial ERPs cannot 
provide a once-and-for all model for all industries, much less for construction firms, especially considering how 
the level of digitalization and use of IT solutions within the industry is still very low and far from the rationalized 
and mechanical nature of other sectors. 


Though, as of today, this type of technology cannot manage a reality so fragmented and unpredictable nor the 
increment of complexity within projects, the flexibility required by 4.0 Industry, and the recent development of 
technological advancements, require the establishment of a new type of interoperability between systems and a 
new way of designing and planning. 


1.2.2 Construction Enterprise Resource Planning 


According to Augenbroe (2006), an ERP system can be divided in two interfaces, one related to the project and 
one to the financial aspects: the development of a stable and clear link between the two — granted by a common 
access to the financial data, project data and customer data - is essential in order to allow a more streamlined 
planning and decisional process. In the construction industry in particular, the system should be project-oriented, 
since the project itself is the cornerstone of the financial activities: in order to ensure the handover of the 
construction within the established timeframe and costs, an efficient organization of the ERP system is needed. 


As explained by Dudgikar et al (2012), the software allows the company to manage its own resources, split into: 


Manpower, meaning the definition and planning of the workforce, as well as identifying the teams and 
the skills required, especially if the work requires the introduction of subcontractors and their crews 


within the project; 


Materials, which groups all processes aimed at materials planning, programming, and their purchase, 


inventory control, materials transportation and handling on site; 


Machines, which handles, for example, the acquisition and management of all the equipment, and the 


identification of the tasks to be undertaken by said mechanical equipment; 


Money, including financial forecasts, project budget and cost control measures. 


The company, then, needs to coordinate not only internal resources (workforce, equipment, inventory ...) which 
can be handled directly, but also various external influences (suppliers’ transactions, subcontractor relationships, 
market situation ...) from which it depends on. Whoever is tasked with the decision-making process has then to 
manage a vast quantity of information that belongs to different areas, and that must also be properly recorded and 
shared. 


Shi & Halpin (2003) developed the so called Construction Enterprise Resource Planning (CERP) system, an 
internet-based framework that develops in three tiers: one related to the User Interface, related to clients’ 
management and data, one related to the Management Servers, which provides administration and the link between 
the other two levels, and one related to Applications, which includes the database of the system and all the project 
and materials information. This type of system can offer immediate access to all the data, granting better 
performance and quickness within the decisional stage; by being a cloud-based system, this system also supports 
a more transparent and collaborative process, as all information is stored within the system itself and is readily 
available to everyone, and can be traced to the person tasked with the transaction or the decision made. 
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The main obstacles of these applications can still be ascribed to the costs and time required for implementation 
and development of the systems, and the lack of software and modules that can handle the complexity of the 
processes typical of the construction industry. While the aim of these systems is to standardize and automate the 
workflow, so that every individual can access and use them easily, the landscape of this industry is still too 
fragmented and requires a level of specificity often incompatible with these goals. 


1.3 ERP-BIM Integration 


A potential integration between ERP software and BIM models could be one of the viable solutions aimed at 
facilitating the implementation of ERP system in construction companies. The status of this integration is, 
nowadays, still at the embryonal stage: the first pursuits focused on the attempt to connect the manufacturing data 
to the information related to the construction processes by exploiting 2D CAD files, but failed to define an efficient 
information-centric approach that could establish a proper integrated work frame (Holzer, 2014). 


Within the multiple causes of these undeveloped integrations between the two systems, the main one can be traced 
to the slowness of the AEC industry in implementing solution related to the digital management of the designing 
and planning stages: the creation of digital and parametric virtual BIM models is still too underdeveloped in many 
companies, and rarely this way of modelling also takes into consideration the procurement or the construction 
process. Particularly, SMEs are especially the ones struggling with the development of these systems, mostly 
because of the lack of properly developed IT systems, which would allow for a more digitized process. As Ghosh 
et al (2011) point out, the time and costs implementation and management of these software is, once again, the 
main cause, as well as the lack of investments in proper staff training. Considering also the typical traditional and 
conservative outlook still very present within the industry, and the very high rates of change of technology 
solutions (Andresen et al, 2002), it’s easy to imagine why these kinds of processes are rarely pursued by SMEs. 


In light of the potential developing a common platform or a direct integration between ERP and BIM systems, one 
of the fundamental traits must be full transparency between all the involved parts, and constant and clear 
communication. Considering the typology and the volume of data and information that would be managed in this 
scenario, this kind of communication and integration requires a properly defined planning process starting from 
the initial phases: basically, a preliminary planning stage should take place well before the designing process itself, 
so that complications and unexpected events can be prevented or properly handled beforehand. 


This sort of reengineering process is described by Kahn (2021), who proposes a total reshuffling of the typical 
designing and planning processes, resulting in a shift of the stress of workload to the initial stages, and a better 
distribution of efforts all-around. In this case, the Operation Stage, which includes 3D Coordination, design review 
and site work, is given the most emphasis, allowing for a better organization between the teams, a consequent 
reduction of designing and construction times. A transparent and well-developed coordination since the initial 
stages is therefore needed and the solution can be the connection between the ERP system and the BIM model. 


2. CASE STUDY 


For the purpose of achieving an integration between the ERP system and the BIM model, a real case study was 
analyzed. The following examination also takes into consideration the management system used by the 
construction company, in this case SAP, that was in charge of the construction of a building that covers an area of 
1400 square meters. As part of this analysis, a first attempt was made to link the ERP software and the information 
model; ideally, the goal would be to directly integrate the two system so that changes within the model (in terms 
of 3D objects volumes and quantities, for example) could be automatically recorded by the ERP software, resulting 
in the update and correction of data and information related to material orders and scheduling. 


2.1 Project content within the management and control software 


In the initial stages of construction planning, a comprehensive preliminary Bill Of Quantities (BOQ) was compiled, 
which took into consideration the cost and pricing of materials, the need for equipment and workforce, taking into 
account not only the market average prices, but also personal agreements settled between the company and the 
suppliers or subcontractors. The structure used to define the BOQ during these initial exchanges between the 
construction company and the clients was therefore replicated in the ERP system, in order to consider every 
material, item and construction task, and their placement within the construction timeframe. 


In order to translate and develop the construction timetable within the Project Builder of the software, a dedicated 
project profile was meticulously created. This project profile has been subdivided into discrete "Work Breakdown 
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Structure Elements," serving as a crucial classification system to effectively categorize the essential tasks for the 
comprehensive project plan. By using a top-down approach, the project structure branches out into multiple 
"networks", which represent the succession of activities that define the construction processes and that are 
characterized by start-to-end relationships and various interdependencies related to the time schedule of the task 
itself. These element profiles are used to plan, analyze and monitor time and resources during the whole 
construction timeframe: in this way, all the items present in the Bill Of Quantities are actually translated into the 
management system, each one representing a particular task on site and its related information. 


These elements, that essentially represent every material used during the construction project and, in this way, its 
relative activity on site, are sorted with the following data: 


Milestones, which are required to proceed with payments; 
Information regarding the production centers related to the supply chain; 


Standard duration and minimum duration for each task (in terms of days); ideally, this kind of information 
should be the result of data obtained from the production center, however, it is often necessary to proceed 


with manual data entry of the durations, according to the schedule defined by the timetable; 


Deadlines: the start/end relationships between activities are determined in this section, as well as which 
task has the priority over the others within the construction process, and any time buffers that might need 


to be taken into account due to possible delays; 


Data relating to materials’ orders, in the case of outsourced work. This section collects data referred to 
the identification of the supplier, its prices, cost items and planned delivery times, as well as data related 


to purchase requisitions managed by the accounting department. 


2.2 ERP-BIM link 


In order to develop a connection between the BIM model e the ERP system, only a small part of the whole project 
was taken into consideration. The work mainly concerned two categories of materials: concrete elements 
(exclusively focusing on the main structures, meaning the pillars and the load bearing walls on the main floor, the 
foundation slab, and roof) and drywall components. These elements were specifically chosen for a few reasons: 
firstly, these materials were enriched with the biggest volume of information - especially for the concrete, and 
therefore were considered ideal examples to test the potential transfer of data between the two system. Secondly, 
these are very different and partially antipode materials: on one hand, once arrived on site, the concrete cannot be 
stored, and has to be poured and used within a certain time frame; additionally, the concrete samples must go 
through a meticulous round of controls and checks, whose resulting documents can be collected and archived in 
the ERP system by connecting them to their relative object inside the project network. On the other hand, drywall 
elements, used here for vertical and slanted elements and for the false ceilings both, can be stored and stocked 
within the perimeter of the construction site for a long period of time, and have to undergo a less strict process of 
control and verification. 


Considering the nature and the purpose of the ERP system, the time aspect and the possibility to whether to store 
or not the materials are especially crucial: delays in previous works, changes regarding the delivery dates of 
materials, or eventual changes in the project can influence the progress and continuation of the whole construction 
phase. Therefore, the possibility to track these changes directly and quickly, whether from variations in the BIM 
model or notification from the site, is essential in order to coordinate the work time schedule, the procurement and 
the delivery of materials and resources. 


In order to then develop a proper connection, upon gathering all the pertinent data related to the tasks and materials 
taken into consideration for this analysis, links have been developed in order to establish a direct correlation 
between the ERP system and the information model. This connection was accomplished through the exportation 
of URL links related to the materials in each phase: each material is, in fact, within the project system, enriched 
with all the information collected from the accounting team (e.g. purchase orders, delivery dates, and all 
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information previously mentioned), the project team (e.g. quantities, locations, construction phase...) and from 
site personnel (lab results and any type of documents collected on site). Once extracted (Fig 2.), the hyperlink can 
be sent by e-mail. Furthermore, the mail can be sent along with eventual notes or attachments from the operator, 
to other professionals who may need it (Fig. 3). 
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» E DQM 10000000000093288 00 000 DQM 10000000000093... 
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» Gi Individual Objects 
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Figure 1. Procedure used to extract URL links related to the material; in this case, the concrete used for the 


foundation slab. 
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Figure 2. Creation of the URL link, related to the material, that can be sent via e-mail along with eventual 
notes and attachments. 
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The URL link provides direct access to the interface of the material, which collects and groups all the information 
related to the material itself and its construction activity on site, as previously mentioned. The hyperlink was then 
incorporated into the 3D object properties that belong to every object in the information model in the modelling 
software; by clicking it, the operator can directly access the ERP interface related to the material taken into account. 
In order for this connection to be operable, the personnel needs to, of course, have to access to the ERP system 
and the project at hand, and be familiar with the ERP interface and use. A basic run down of the system then would 
be needed for all the professionals within the work frame, in order to understand and make use of the connection. 


The possibility to export a link that can directly bind an object inside the model to its material in the ERP system 
enables the association of the material with multiple objects at the same time, as they refer to a specific item rather 
than a task, effectively increasing the information and data strictly related to the object within the BIM model. For 
example, rather than extracting a single different link for each pillar built, which would consist realistically into a 
different activity each on site, a singular link to the specific type of concrete used was extracted: the link then 
brings up the related item in the ERP system, showcasing prices, deadline, schedules, orders and all the data related 
to that specific activity. In this way, an effective direct link was created between the BIM model and the ERP. 


On the other hand, it is, again, necessary to point out that the access is limited to approved personnel since they 
are the only ones having the authority and responsibility to develop the project in the ERP system. Moreover, at 
stage of this case study, the linking system is not only manual, but also static, since it needs to be constantly 
updated. 


3. AUGMENTED REALITY TOOLS AND PROCUREMENT 


Another way to establish a link between the 3D BIM model and the ERP, concerns the use of the augmented reality 
(AR) tools. Through the AR technology it is in fact possible to overlap the BIM model containing the digital 
information, and the physical space of the construction site; thus, the data contained in the property set of the BIM 
model will be transposed to the matching objects in real life. 


Thanks to this superimposition, AR is conventionally primarily used for project monitoring and control, but it can 
also be employed to implement the logistical aspects related to the procurement of materials and to their storage 
within the construction site. Through the use of an AR mobile app, by selecting each 3D BIM object, it is possible 
to notify its status in real life in terms of procurement information. In this way, every member of the construction 
and designer team gets to know in real time if a specific object is identified, ordered, delivered, checked, or 
installed, and the corresponding information can be recorded in the ERP system (Wang and Love, 2012). The 
planning of the supply and procurement of materials can be easily updated, allowing a more effective planning of 
subsequent orders. Similarly, it is possible to track the availability of materials by precisely defining the 
requirement for the consecutive period, guaranteeing an on-time supply approach. 


3.1 Definition of status notification for the case study 


Each status notification groups a potentially wide amount of information, therefore their selection based upon 
predefined criteria is fundamental. In the specific case study - in addition to the above-mentioned advisory of 
identification, order, delivery, check and installation - the information considered most significant to form this type 
of report (Fig. 4) are: 


Object of the notification and location in the construction site; 


Figure in charge: in order to fill out this field, one or more companies responsible for the procedure of 


supply and installation of the object must be selected; 
Expected date for delivery and installation of the object attending to the construction schedule; 


Properties of the BIM object: this field, unlike the others, cannot be filled in by the individual completing 
the status notification. The properties of the object are inherited from the BIM model and automatically 


entered in the field to avoid errors and to speed up the signaling procedure. 
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Considering the purpose of the object status notification, the most significant property concerns the Work 
Breakdown Structure (WBS) code which uniquely identifies the selected objects. These alphanumeric codes are 
also present in the ERP network and allow the operator to identify the corresponding purchase order and to update 
it by registering a delay. 


Once all the fields have been completed, the final report will be visible on a web platform to which only specific 
members of the team have access to. The operator who is in charge of updating the orders on the ERP software 
will be notified via e-mail and will consequently look for the matching WBS code in the ERP network and act 
consequently. 


As for now, the transfer of this kind of information from the construction site to the ERP takes place manually, due 
to the lack of interoperability between the software involved in the procedure; nevertheless, it is easy to imagine 
that the automatization of the process is the next achievement that needs to be accomplished in the future 
developments of the technologies and tools involved (AR, 3D BIM, 4D BIM-ERP). 


~Ectanated delivery date 


7 April 2023 
e 
Late Delivered «Scheduled date of installation 
Identitied 11 May 2023 


Ordered nS Ea, ae 
» Date of filling in the status notification 


Checked 
11 May 2023, 6:02 pm 


WBS 
03.05.03.02 


Object identific ation code 
tukain wall perwb 


Installed 


Figure 3. From left to right: (a) Selection of the object the completion of the Status Notification is referred to. 
(b) Multiple-choice responses to indicate the object's status. (c) Properties of the BIM object, automatically 
collected from the model: the WBS code is highlighted. 


Ultimately, the two types of connections between the BIM model and the ERP system established operate in 
accordance with the diagram presented in fig.4 and necessitate a continuous update of information following the 
recording of new data obtained through the surveying of the construction site. 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


ERP System BIM Model Augmented Reality tools 
ý / N 


Each object’s property set: Status notification: 


URL link 


to the material (and its URL link 
associated information) 
WBS codes WBS codes 


Ni Na 
BIM modeller 


Y M 


Technical office personnel Construction site inspector 


= 


Figure 4. Schematic drawing of the procedure for establishing the ERP-BIM connection. An URL link is 
extracted from the ERP system and incorporated within the properties of objects in the BIM model, alongside 
WBS identification codes. Utilizing augmented reality tools, the Construction Site Inspector generates status 
notifications, which subsequently inform the BIM modelers and the Technical Office personnel of the current 
progress of the ongoing work. Both the BIM model and the information contained in the ERP system will be 

updated, and the flow will potentially start over again. 


4. CONCLUSIONS 


Through the process described, it was possible to develop a connection between the ERP system and the BIM 
model, both by the development of a hyperlink between the material stored in the management system and its own 
3D object, and by the establishment of status notifications sent directly from the construction site to the company’s 
system. 


While effectively linking the two systems, the connection created remains, as mentioned, accessible only to those 
who have access to the company's management system, as the software access credentials are required to be able 
to view the system interface. Furthermore, the hypothetical idea of connecting the material in the ERP system to 
the 3D object belonging to the information model currently constitutes a manual and non-automated task, resulting 
in an exceedingly laborious process, especially considering the number of objects modeled and the materials used. 
Moreover, the link appears to be static and unidirectional: any change made to the BIM model, such as alterations 
in volume quantities or material types, are not directly perceived and recorded by the ERP system, and every 
modification has to be arranged manually. Similarly, item entries updates within the ERP system do not 
automatically update within the link, which would need to be exported again and consequently replaced in the 3D 
object properties. 


As mentioned, it is still a very preliminary and underdeveloped connection, still manual for the most part, and far 
from the potential goal of streamlined automatization between the two systems. 


To achieve a higher level of integration, the development of a potential plug-in bridging the gap between the 
modeling software and ERP software would be necessary: since these two systems were designed for different 
purposes and objectives, mostly due to the different industries they were developed for and by, establishing a direct 
connection between the two programs would entail the involvement of the respective software distribution 
companies. Their cooperation and a great amount of resources, both in terms of time and investments, would be 
required to generate a bidirectional linkage between the systems. 


In the eventuality of such a development, it would become crucial to take into consideration the high level of 
variability that characterizes not only construction projects, but also the coordination and management of these 
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projects by construction firms. The perspective of a full-scale ERP-BIM integration would imply the 
standardization of the designing, planning and managing processes within the AEC industry, so that the transaction 
developed within the software can be distributed and used universally. This would result in an adjustment of the 
management procedures between the companies, further facilitating collaboration and communication. 
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ABSTRACT: The research focuses on an approach for the development of an automated workflow in the field of 
fire prevention for the validation of BIM models regarding “Subjected Activities” (according to Italian law - 
Presidential Decree 151/2011), with reference to the contents of the Horizontal Technical Regulation of the Fire 
Prevention Code. In fact, since 2019, the Italian Fire Department, through the Fire Digital Check project, is 
seeking to digitize the Fire Prevention Code in order to allow professionals to integrate their BIM models with the 
objectives required by the fire regulations. The workflow proposed involves three distinct processes, integrating 
them within a single workflow: study of the technical regulations and transposition into the digital environment 
using proven methods such as RASE, Tx3 and TIO, information modelling of a case study and implementation of 
validation algorithms using Solibri software. One of the main points of the proposed process is to use an open and 
interoperable format such as that offered by the IFC schema for the information exchange between the different 
software involved. 


KEYWORDS: fire prevention, BIM, validation, code checking, digitization 


1. INTRODUCTION 


With the Industry 4.0 approach, the construction sector has been involved in the development of systematic 
digitization processes (Daniotti et al., 2022) and its progressive transformation into a data-driven production sector 
appears to be irreversibly (Méda et al., 2021). Furthermore, in recent years the introduction of information 
management with Building Information Modeling (BIM) tools and methodologies in public supply, service and 
works contracts has highlighted the need to define structured and planned data flows and information exchange 
between the various phases of the construction process. This text is intended as an in-depth study on the subject of 
digitization within the technical-administrative processes concerning public administrations with a thematic focus 
on fire safety in Italy. In particular, it describes and presents an application of a workflow aimed at checking BIM 
models to be submitted to algorithms for automatic validation and verification of the contents of the Fire Prevention 
Code D.M. 03-08-2015. 


The following software was selected for the application of the workflow, which met the requirements that emerged 
in terms of flexibility in model creation, data export and control: Autodesk Revit, as BIM authoring software, and 
Solibri Model Checker, as BIM model checking software. The proposed structure, however, want to be more 
general, in fact it does not require the use of specific software and exchange formats, and for its implementation 
reference was made to proven methodologies for the translation of normative code such as TIO, Tx3, RASE. The 
exchange of data between proprietary software was also carried out using the IFC (Industry Foundation Classes) 
format, which is an open standard, developed by buildingSMART, capable of describing the ontology of the 
building and the different specialisations related to a generic building process, through the use of entities, 
properties and relationships. In addition, the use of such data schema has become mandatory for any BIM services 
in the public sector in Italy. 


The application of the workflow presented in this paper refers to “subjected activities” to the controls of the the 
National Fire and Rescue Service in accordance with D.P.R. 151/2011, without a Vertical Technical Rule, within 
the project evaluation procedure. This is limited to the automatic validation and verification of the contents of the 
Fire Prevention Code D.M. 03-08-2015 at Section S.1 - Reaction to fire. The starting point and main reference of 
this work was the Fire Digital Check project, born from the will of the Department of Fire, Public Rescue and 
Civil Defence to undertake the necessary digitisation process of the fire prevention procedures provided by D.P.R. 
151/2011. 


2. STATE OF ART 


BIM modelling of new or existing buildings guarantees a homogeneous database capable of transmitting 
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information not only regarding the geometric component of an object but also its informative and documentary 
characterisation. The model assumes in this context not only the meaning of a geometric representation of building 
components but rather becomes a database where all the information produced in the different phases of the 
building process can be collected (C. M. Eastman, 2011). This methodology can be applied for new buildings as 
well as for existing buildings (Volk et al., 2014), and in particular much research is being developed on HBIM 
field (Murphy et al., 2013) for the digitisation of historical heritage (Biagini et al., 2022). The digitisation of all 
this assets must, however, be followed at the same time by a digitisation also of the national and international 
regulations so that it becomes possible to apply automated or semi-automated methodologies for the verification 
of these rules and the issuing of authorisations in the design phases of an asset. The concepts of clash detection 
and code checking, defined in the UNI 11337 standard as analysis of possible geometric interferences between 
objects, models and drawings’, and 'information inconsistencies of objects’, respectively, become important in this 
context. The automation of these checks with different tools during the design phases (C. Eastman et al., 2009) has 
led to many experiments over the years in different areas around the construction process, such as the verification 
of health and safety requirements in the design of the site layout (Getuli et al., 2017). The Singapore government 
has been a forerunner in this discipline through the CORENET e-PlanCheck project, developed by a collaboration 
between the Ministry of National Development and the Construction and Real Estate Network, which proposed 
the automatic checking of BIM models in the open IFC format. In general, code checking can be set up as a multi- 
domain framework based on a set of parametric rules (Solihin & Eastman, 2016; Zhang et al., 2013) that can be 
generated on the basis of quantitative specifications derived from a text. 


The objective of this study is to understand which parts of a specific regulation are most easily digitised and how 
to translate these into verifiable entities or parameters within a BIM model through a combination of different 
methods (Solihin & Eastman, 2015). A reproducible workflow for the digitisation of a regulation using proven 
methods, such as TIO, Tx3, RASE, is then proposed for analysis (Hjelseth & Nisbet, 2011). 


RASE (Requirement - Applies - Select - Exception), is a methodology aimed at identifying key points within the 
regulatory text. The latter are selected, classified and organised schematically with the aim of simplifying the 
elaboration of rules that can be implemented through programming languages. These operations are generally 
carried out by highlighting the text using coloured mark-ups that correspond to precise types of content within it. 
The portions of text that can be defined as necessary or requested requirements are highlighted In blue 
(requirements), the portions that define the scope of applicability of the requests or necessary requirements are 
highlighted in green (applies), the portions of text that contain the object or reference to the request or necessary 
prerequisite are highlighted ), in red (select), and finally the portions of text that refer to an exception or that 
otherwise restrict the scope of applicability of the requested requirements are highlighted in yellow (exception). 


Tx3 (Transcribe - Transform - Transfer), is a method aimed at expressing the degree of translatability of regulatory 
text into digital language. This is a very important assessment to make, since this translation process is not always 
advantageous and it may happen that the result obtained does not justify the time taken to achieve it. Basically, the 
method involves the classification of standards into three different categories: transcribe, i.e. those parts of 
normative texts that can be easily translated into digital language, usually prescriptive norms fall into this category; 
transform, parts of normative texts that can be rewritten while retaining their implementation purpose, but which 
often require the introduction of constraints such as qualitatively defined benchmarks but from which it is still 
possible to translate the requirements quantitatively; transfer, instructions that cannot be implemented due to the 
imprecise way in which they are expressed within the reference text, such as benchmarks that are not defined 
quantitatively but only qualitatively. 


TIO (Test Indicator Objectives), has the objective of bridging the gap between qualitative and quantitative 
requirements in such a way as to be able to increase the degree of digitisation for the former, thus allowing them 
to be incorporated into the process. This is a very important step as the most difficult requirements to translate are 
often qualitative ones, because they cannot always be correlated to quantitative parameters or measurable 
quantities. 


In order to verify the applicability of these procedures, it was decided to take as a reference the Fire Prevention 
Code, for which a discussion table on its possible digitisation has already been started in Italy at the beginning of 
2019 as part of the BIM-FDC (Fire Digital Check) project, aimed at automating the validation of Fire Prevention 
projects drawn up with the code, limited to compliant solutions. A combination of the above methods was then 
used to digitise the regulatory text, limited to section S.1 of the code. 


The TIO method, aimed at translating the requirements expressed in qualitative terms, and at identifying 
parameters that allow their verification and control, allowed the results of the risk analysis to be parameterised, 
and to correlate parameters such as Rvita, Rpeni, Rambiente that characterise an activity (or compartment) to specific 
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requirements. The Tx3 method, on the other hand, was used to verify only the compliant solutions foreseen by the 
Fire Prevention Code. Given the high specificity of the alternative solutions, which do not generally have a 
prescriptive character, these must eventually be treated in a specific way for each case study and are not suitable 
for verification within a general automatic validation methodology. Finally, by means of the RASE methodology, 
the object to which the regulatory prescriptions of Section S.1 refer is identified: in this specific case we refer to 
the scope of the subjected activities, defined by the code in section G. In addition, a distinction has been made 
between the areas that constitute "escape routes" of the activity (or compartment) and areas that constitute "other 
spaces" of the activity, since different levels of performance are generally required of these, depending on the Life 
Risk attributed to the activity (or compartment). 


3. METHODOLOGY 


The workflow presented within this paper is based on the application of the BIM methodology in the field of fire 
risk management and the validation of the related regulatory features. Specifically, the objective is the automatic 
validation of BIM models concerning subjected activities (according to D.P.R. 151/2011), with reference to the 
contents of the Horizontal Technical Regulation of the Fire Prevention Code. Three distinct areas can therefore be 
found in the process: technical regulations, BIM modelling and validation algorithms. Each of these areas must 
relate to the others, and for each of these relationships objectives and criteria for their achievement must be defined. 


Technical Standard - BIM Modelling. Technical regulations contribute substantially to determining level of 
development of BIM models. Operationally, therefore, it determines what is to be modelled. 


Technical Standard - Validation Algorithms. Technical standard provides the basic parameters, criteria and logic 
for the implementation of validation algorithms. 


BIM Modelling - Validation Algorithms. The data structures of BIM models must be organised according to 
predefined criteria, so that the necessary information is available for the application of validation algorithms. At 
this juncture, a standard must be defined for the databases on which the logic of the algorithms can be developed. 


3.1 Preliminary study of legislation and information exchange 


Annex I of the Fire Prevention Code 'Technical Regulations for Fire Prevention’ consists of the following Sections: 
G (Generalities), S (Fire Strategy), V (Vertical Technical Rules) and M (Methods). From the study of the relative 
contents of the Sections, with reference to the scope of application and the objectives set, it was possible to define 
a workflow for the digitisation of the Code as shown in the figure (fig. 1). 


The following methodology is based on the following principles: 
1 - validation is performed on BIM models inherent to a specific subject activity (pursuant to D.P.R. 151/2011); 


2 - risk analysis, the definition of design goals and safety objectives, and risk assessment, are steps that must 
necessarily precede the entire workflow from the creation of the BIM model to its validation. These steps need a 
specific assessment by a fire safety professional. From here, we arrive at the quantitative definition of the Rvita, 
Reni, Rambiente parameters for each compartment of the subject activity, which represent input data for the project; 


3 - on the basis of the Rvita, Reni, Rambiente parameters, the minimum required performance levels are identified 
within the chapters of section S of the Fire Prevention Code, and for these, through the digitisation of the regulatory 
text, the parameters that determine compliance with the relative solutions are identified; 


4 - when creating the BIM model inherent to the subject activity, geometric and alphanumeric data are defined by 
the fire protection professional responsible for the project; 


5 - the achievement of these performance levels is verified through the application of the compliant solutions set 
out in the code, which, in general, are prescriptive in nature. Verification must take place by means of automatic 
validation procedures, using a library of algorithms specifically created for this purpose, which is applicable to 
each subject activity without a Vertical Technical Rule. This library of algorithms must also be scalable and 
reusable for more complex projects; 


6 - the workflow does not provide for the application of alternative solutions to achieve the minimum performance 
levels. 
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Preliminary study of the normative text 


Ve 


Application of methods for text digitisation: TIO, Tx3 


Vv 


Automatic Validation Scope Circumscription 


y 


Application of the RASE method for the digitisation of 
normative text, identifying: 


- Reference area 
- Performance requirements 
- Reference Entities 


- Exceptions 


ay" 


Schematisation of the regulatory structure and 
identification of the objects and parameters functional 
to the automatic verification of the Code for models 


Fig. 1: Workflow for digital translation and parameterisation of normative text. 


Based on these assumptions, three main work packages were then set up: the analysis of the regulatory text and its 
digitisation, the BIM modelling of the activity subject to regulatory control, and the application of algorithms for 
the validation of the BIM model. The first phase involved the definition of information requirements, in the form 
of alphanumeric parameters and geometric characteristics, which were to be included in the model and 
subsequently subjected to verification. The second phase involved the modelling of the considered activity, 
including not only the geometry, but also the design parameters that need to be. The last phase, on the other hand, 
led to the application of validation algorithms of the BIM model with reference to the contents of the Fire 
Prevention Code in order to verify the achievement of the minimum performance levels required for a generic 
subject activity. 


Whereas the objectives and the actors involved in this information exchange process, each with its own tools and 
software, it is necessary to answer to two needs: to provide BIM authoring and code checking software capable of 
managing open and interoperable exchange formats, and to define a standard for the data structures to be 
exchanged, so that the information content within the model can be correctly analysed by the automatic validation 
algorithms used by the Fire and Rescue Service. Autodesk Revit BIM authoring software was then used to create 
the geometric model and assign the required parameters, then its information content was exported in IFC format. 
Once exported in IFC, it is possible to open this model within any model checker software, in our case the Solibri 
Model Checker software (Office version) was used, and to perform the validation operations on several levels. 
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The image below (fig. 2) shows the schematisation of which entities were involved in the verification algorithms. 
In particular, it shows the hierarchy of the spaces within the entire activity subject to inspection and where the 
areas for which specific performance levels are to be verified are located. 


Ambiti areas Ambiti areas 


Escape routes / Generic rooms 


Fig. 2: Outline of the functional structure for digitising the code - Section S.1. 


To achieve the required performance levels, some of the objects in the areas will have to have certain fire 
characteristics. The diagram below (fig. 3) shows what was modelled in the workflow application. 


Ambiti areas — Escape routes 


Upholstered furniture 
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oatings 
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Fig. 3: Identification of the entities subject to verification within the schematic structure for code digitisation 
purposes. 


3.2 P _ set definition 


The process of rule interpretation and digitisation of the normative reference text necessarily entails the definition 
of specific parameters that will subsequently be used in the verification process. Furthermore, in order to efficiently 
organise the data structure contained in the model, it is necessary for parameters to be grouped into specific 
Property Sets. This will make it easier to retrieve the necessary data to be analysed at a later stage. As part of the 
digitisation of section S.1 on the achievement of Fire Reaction Performance Levels, with reference to the 
normative structure outlined in the previous chapter, the elements to be modelled and the parameters to be assigned 
to them are defined. 


Two entities were modelled within the BIM authoring software: activity scope and material. The tables below 
show the parameters associated with these two entities. 
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Table 1: List of parameters associated with the Ambiti areas of activity. 
Property name Property type Description 


NomeCompartimento IfcLabel Its value makes it possible to define to which compartment a given domain 
belongs, so that all domains belonging to a given compartment can be filtered in 


the validation processes 


RVita IfcLabel Its value corresponds to the Life Risk determined during the risk assessment for a 
given compartment. By assigning the Life-Risk of the relevant compartment to 
each area, the performance level required for the Fire Reaction characteristics can 
be easily linked to it. 


ViaDiEsodo IfcBoolean Its value makes it possible to determine the type of scope, as this contributes to 


determining the required minimum performance levels. 


Table 2: List of parameters associated with materials. 


Property name Property type Description 


ClassificatoReazioneAlFuoco IfcBoolean Its value is used to determine whether an object representing the Material entity 
within the model falls within those listed in Tables S.1-5, S.1-6, S.1-7, S.1-8. 


GruppoReazioneAlFuoco IfcLabel Its value, assigned with reference to the provisions of Ministerial Decree 26/6/1984 
and Ministerial Decree 10/3/2005 by the fire protection designer, frames the fire 
reaction characteristics of a material, which, as can be seen in the normative text, 


are closely related to the minimum performance levels indicated by the law. 


4. CASE STUDY 


Modelling inside the BIM authoring software Autodesk Revit, within the scope of application of this methodology, 
is carried out for a simplified case study characterised by the structure shown in the table. 


Table 3: Pilot model structure. 


ACTIVITY SUBJECT TO FIRE BRIGADE INSPECTIONS 


COMPARTMENT 01 COMPARTMENT 02 
Ryita = D1 — Roeni = 1 Rambiente = NC Ryita = B2 — Ryeni = 1 Rambiente = NC 
Ambito 01 Escape Route Ambito 02 Escape Route 
Ambito 03 Generic Room 


Each Ambito (area of activity) within the Revit software was modelled as space, then exported as IfcSpace class 
within the IFC scheme. Regarding the modelling of the construction and furnishing elements that are covered in 
the tables of the Fire Prevention Code (S.1-5, S.1-6, S.1-7, S.1-8), and subject to verification in the validation 
process, the objects in the table below were modelled. 


Table 4: Identification of the entities subject to verification within the pilot model structure. 


n.3 Upholstered furniture [Category: Furniture] 


[Compartment 01] Ambito 01 n.1 Floor covering [Category: Floor] 


n.1 Suspended ceiling [Category: Ceiling] 


n.1 Protected insulation [Category: Wall] 


n.3 Upholstered furniture [Category: Furniture] 


Ambito 02 n.1 Floor covering [Category: Floor]] 
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n.1 Suspended ceiling [Category: Ceiling] 


[Compartment 02] n.1 Protected insulation [Category: Wall] 


n.3 Upholstered furniture [Category: Furniture] 


Ambito 03 n.1 Floor covering [Category: Floor]] 


n.1 Suspended ceiling [Category: Ceiling] 


n.1 Protected insulation [Category: Wall] 


Proprieta x 8 GD) x 


VVF_Mobile imbottito 
Poltrona imbottita 600x600x600mm 


Arredi (1) v| FB Modifica tipo 
Altezza 0.6000 A 
Larghezza 0.6000 
Lunghezza 0.6000 

Dati identita R 
Immagine 
Commenti 
Contrassegno 

Fasi a 
Fase di creazione Stato di Progetto 
Fase di demolizione Nessuno 

Parametn IFC a 
ClassificatoReazioneAlFuoco A 
GruppoReazioneAlFuoco 
IfcExportAs ifcFurniture 


Fig. 4: Functional parameterisation for automatic validation of entities within the model. 


To reach the level of information contained in the model and to implement the subsequent automatic validation 
process, it is necessary to enter the parameters foreseen during the digitisation of the normative text by assigning 
them to the modelled entities. These parameters were created as shared parameters, organising them in an external 
text file and therefore reusable in other models. Once the parameters had been assigned to the categories of the 
modelled objects, the data subject to verification were entered (fig. 4). The data assigned to the parameter 
GruppoReazioneAlFuoco did not in all cases reflect the requirements of the Code for achieving the minimum 
performance level required of the area belonging to a particular Compartment. This was an intentional error in 
order to show any issues that may result from the analysis. 


The next step involved exporting the model created in the BIM authoring software in the open IFC format. The 
operation was carried out by first checking the settings of Revit's IfcExporter, analysing that the correct IFC classes 
corresponded to the different entity categories present in the model. Should it prove useful to further specify the 
IFC export class for a single instance, it is possible to exploit the shared parameters collected in the specially 
provided PSet IFC_Exporter Property Set. As far as the export of parameters within the IFC format is concerned, 
these were organised in specific custom P_ sets, through the structuring of a special file in .txt format subsequently 
called up in the Revit software export settings. Once the model was exported in the IFC schema, it was possible 
to open the latter within the model checking software. The image below shows how the imported model, displayed 
in the three-dimensional view, is represented by an organised data structure characterised by the hierarchy defined 
during the model creation phase (fig. 5). 


At this point, within the model checking software we proceed to set up the groups of validation algorithms for the 
BIM model. Referring to what emerged from the study of the structure of the Fire Prevention Code, for the 
complete realisation of a library of verification algorithms, a library of Rulesets organised as shown in the table 5 
is envisaged. 
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Fig. 5: Representation of the BIM model within the automatic validation software. 


Table 5: General ruleset structure for the automatic validation of the fire prevention code 


RULESETS 
Ruleset Name Function 

VVF_S.1_ReazioneAlFuoco Verification of conforming solutions 
VVF_S.2_ResistenzaAlFuoco Verification of conforming solutions 
VVE_S.3_Compartimentazione Verification of conforming solutions 
VVE_S.4_Esodo Verification of conforming solutions 
VVF_S.5_GSA Verification of conforming solutions 
VVE_S.6 ControlloIncendio Verification of conforming solutions 
VVE _S.7_RivelazioneAllarme Verification of conforming solutions 
VVE_S.8_ControlloFumiCalore Verification of conforming solutions 
VVE_S.9_OperativitaAntincendio Verification of conforming solutions 
VVE_S.10_SicurezzalmpiantiTecnologiciServizio Verification of conforming solutions 
VVF_G_AnalisiDelRischio Data compilation check 

VVF_Altro Data compilation check 


Each of the rulesets listed above will contain specific validation algorithms for the BIM model built with reference 
to the Code conforming solutions set out in the ten Chapters of Section S of the Fire Strategy. Two further rulesets 
are also foreseen: VVF_G _AnalisiDelRischio, dedicated to the control of the correct compilation of the 
parameters relative to the risk analysis, and VVF_ Altro, dedicated to the control of parameters not directly 
referable to the contents of sections G and S of the regulations, but necessary to define the more complex validation 
algorithms. 


Accordingly, for the purpose of validating the contents of Section S.1 of the Code, the structuring of the internal 
subgroups of the Ruleset VVF_S.1_ReazioneAlFuoco is given. 
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Table 6: Ruleset structure for the verification of reaction to fire performance for a compartment characterised by 
a generic risk profile. 


RULESET - VVF_S.1_ReazioneAlFuoco 


Subgroups of validation algorithms Function 


VVF_S.1_ReazioneAlFuoco_Controllo Verification of correct compilation of validation parameters 


VVF_S.1_ReazioneAlFuoco_A1 


Verification of Conforming Solutions by Activity or Compartment Ryita Al 


VVF_S.1_ReazioneAlFuoco_A2 


Verification of Conforming Solutions by Activity or Compartment Ryita A2 


VVF_S.1_ReazioneAlFuoco_A3 


Verification of Conforming Solutions by Activity or Compartment Ry it, A3 


VVF_S.1_ReazioneAlFuoco_A4 


Verification of Conforming Solutions by Activity or Compartment Ryit A4 


VVF_S.1_ReazioneAlFuoco_B1 


Verification of Conforming Solutions by Activity or Compartment Ryita B1 


VVF_S.1_ReazioneAlFuoco_B2 


Verification of Conforming Solutions by Activity or Compartment Ryita B2 


VVF_S.1_ReazioneAlFuoco_B3 


Verification of Conforming Solutions by Activity or Compartment Ryita B3 


VVF_S.1_ReazioneAlFuoco_C1 


Verification of Conforming Solutions by Activity or Compartment Ryita C1 


VVF_S.1_ReazioneAlFuoco_C2 


Verification of Conforming Solutions by Activity or Compartment Ryita C2 


VVF_S.1_ReazioneAlFuoco_C3 


Verification of Conforming Solutions by Activity or Compartment Ryita C3 


VVF_S.1_ReazioneAlFuoco_Cil 


Verification of Conforming Solutions by Activity or Compartment Ryita Cil 


VVF_S.1_ReazioneAlFuoco_Ci2 


Verification of Conforming Solutions by Activity or Compartment Ryita Ci2 


VVF_S.1_ReazioneAlFuoco_Ci3 


Verification of Conforming Solutions by Activity or Compartment Ryita Ci3 


VVF_S.1_ReazioneAlFuoco_Ciil 


Verification of Conforming Solutions by Activity or Compartment Ryita Ciil 


VVF_S.1_ReazioneAlFuoco_Cii2 


Verification of Conforming Solutions by Activity or Compartment Ry it, Cii2 


VVF _S.1_ReazioneAlFuoco_Cii3 


Verification of Conforming Solutions by Activity or Compartment Ryita Cii3 


VVF_S.1_ReazioneAlFuoco_Ciiil 


Verification of Conforming Solutions by Activity or Compartment Ryit Ciiil 


VVF_S.1_ReazioneAlFuoco_Ciii2 


Verification of Conforming Solutions by Activity or Compartment Ryjita Ciii2 


VVF_S.1_ReazioneAlFuoco_Ciii3 


Verification of Conforming Solutions by Activity or Compartment Ryita Ciii3 


VVF_S.1_ReazioneAlFuoco_D1 


Verification of Conforming Solutions by Activity or Compartment Ryita D1 


VVF_S.1_ReazioneAlFuoco_D2 


Verification of Conforming Solutions by Activity or Compartment Rvit D2 


VVF_S.1_ReazioneAlFuoco_E1 


Verification of Conforming Solutions by Activity or Compartment Ryita E1 


VVF_S.1_ReazioneAlFuoco_E2 


Verification of Conforming Solutions by Activity or Compartment Ryita E2 


VVF_S.1_ReazioneAlFuoco_E3 


Verification of Conforming Solutions by Activity or Compartment Ryita E3 


As can be seen from the list above, net of the group of rules necessary to control the parameters strictly related to 
the verification for section S.1, within the Ruleset VVF_S.1_ReazioneAlFuoco there are subgroups of validation 
algorithms which refer to the specific Rvita attributed to each compartment of the activity, defined in the risk 
assessment phase. Each subgroup of the ruleset VVF_S.1_ReactionAlFuoco will be the container of validation 
algorithms necessary to filter the objects to be verified and which in general belong to different types of areas. It 
may be noted that this type of organisation of the structure for the verification algorithms allows the application 
of the Ruleset VVF_S.1_ReactionAlFuoco to a BIM model of any activity, even a complex one, and made up of 
compartments to which different risk assessment parameters are attributed. 


It should be noted that the verification algorithms were created from the library contained in the Solibri Model 
Checker software, which provides a very extensive archive of basic rules, with which even complex validation 
algorithms can be created. We take the validation algorithms contained in the Ruleset VVF_S.1_ReazioneAlFuoco 
as an example, highlighting the following subgroups: VVF_S.1 ReazioneAlFuoco Controllo and 
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VVF _S.1_ReactionAlFuoco_B2. 


The subgroup VVF_S.1_ReazioneAlFuoco_Controllo (fig. 6) provides two control algorithms that verify the 
presence of the parameters and the correct input of the relative values for the purposes of subsequent Code checks. 
The first check verifies that all the elements requiring a Fire Reaction classification, and therefore referable to 
Tables S.1-5 S.1-6 S.1-7 S.1-8 contained in Section 8.1 of the Fire Prevention Code, have a plausible value 
assigned [GM0, GM1, GM2, GM3, GM4] to the GroupFireReaction parameter. The second check verifies the 
presence of the parameters NomeCompartimento, Rvita, ViaDiEsodo and their compilation with plausible values. 


v (5) WE-_S.1_ReazioneAlFuoco_Controllo 
§ Controllo 01_DefinizioneParametro_GruppoReazioneAlFuoco SOL/203/2.4 ® 


Controllo 02_DefinizioneParametriAmbitiAttivià SOL/203/2.4 


Fig. 6: Structure of the control ruleset for the correct compilation of parameters 


The subgroup VVF_S.l_ReazioneAlFuoco_B2 (fig. 7) has been divided into two further subgroups whose 
function is to check the two types of Ambiti areas identified by the Code in Section S.1, namely Vie d’Esodo [VE] 
and Altri locali [AL]. The rule Ambiti Vie d'esodo_Rvita=B2 has the purpose of filtering all the entities within the 
model which must comply with Fire Reaction requirements, and which belong, specifically, to Activity Ambiti 
areas which are Exit Routes and which belong to compartments to which an Rvita=B2 has been attributed during 
the risk assessment phase. The Fire Reaction characteristics of all objects belonging to these areas will be 
specifically verified by the following algorithms. The image below shows by way of example one of the 
verification algorithms which hierarchically follow the rule Ambiti Vie d'esodo_Rvita=B2. 


1_ ReazironeAlFucco_B:z 
v ©) WeF-_s.1_82_vE 
v § Ambiti Vie d'esodo_Rvita=B2 SOL/1/5.0 @ 
S GruppoReazioneAlFuoco_Pavimenti SOL/9/3.1 © 
§ GruppoReazioneAlFucco_Controsoffitti SOL/9/3.1 © 
§ GruppoReazioneAlFuoco_isolanti SOL/9/3.1 ® 
Fig. 7: Ruleset structure for the identification and verification of performance for the reaction to fire of 
compartments. 


In particular, the tab below refers to the rule GruppoReazioneAlFuoco_Arredi (fig. 8). This takes into account all 
the elements filtered by the previous algorithm classified as Furniture, for which specific reaction-to-fire 
characteristics are specified. 


Component Property Allowed Value 
- Furniture PSet_VVF_ReazioneAlFuoco.GruppoReazioneAlFuoco 
A Furniture PSet_VVF_ReazioneAlFuoco.GruppoReazioneAlFuoco 
AX Furniture PSet_VVF_ReazioneAlFuoco.GruppoReazioneAlFuoco GMO 


Fig. 8: Definition of possible values for the parameters under verification. 


The possible values of the property GruppoReazioneAIFuoco, in this specific case described, are GM2, GM1, 
GMO. The operation of the remaining rules underlying the rule Ambiti d'esodo_Rvita=B2 is entirely similar and 
should be defined for each of the materials defined in Tables 5 S.1-6 S.1-7 S.1-8 contained in Section S.1 of the 
Fire Prevention Code. 


Two types of errors found during Code Checking are given as examples: 


- Failure to compile the parameters (algorithm for checking the formal correctness of the information model). For 
some instances, the GruppoReazioneAlFuoco parameter is not filled in, although the objects they represent must 
necessarily respect precise reaction to fire characteristics to reach the minimum performance level required by the 
Standard. 


- Reaction to fire of materials not suitable for the Rvita value of the compartment (algorithm for checking the 
technical correctness of the information model). Some instances do not have suitable fire reaction characteristics 
with respect to the area and compartment in which they are located. The value assigned to the parameter 


463 


GruppoReazioneAIFuoco, which are not among those covered by the validation algorithm. 


At the end of the validation procedure, the Solibri Model Checking software can generate a report of the issues 
that emerged during the analysis phase that can be exported in various formats, including the open BCF format, 
allowing further possibilities for the development of the data flow considered in the methodology presented. 


5. CONCLUSION 


In the methodology just shown it was possible to observe how automated verification procedures can be 
implemented for the verification of the Fire Prevention Code, at least regarding the compartment of prescriptive 
measures given by the conforming solutions reported in the different sections of the RTO of the fire prevention 
strategy. Although the work presented focuses only on the contents of section S.1, given the RTO's structure and 
potential, as well as the flexible interoperability between Revit and Solibri software, there appears to be ample 
scope for the development of Code Checking of the other sections. 


In any case, it is necessary to emphasise the importance of developing a standard for BIM modelling, and in 
particular for the structure of IFC models that, in view of the digitalisation of fire prevention procedures, will have 
to undergo automatic validation by the Technical Office of the Fire Service. In fact, the same organizaton will 
implement the code checking phase and it will define the data structure of the IFC model that can then be linked 
to its set of automatic validation algorithms. Validation algorithms are an information verification tool that needs 
databases structured according to a declared standard to function properly. 


Regarding the necessary characteristics of the BIM authoring software useful for the application of the workflow 
described in this text, a high degree of flexibility in the export of data in IFC format is considered of fundamental 
importance, which means having both the possibility of creating user-defined P_sets and the possibility of 
assigning user-defined properties to the objects in the model. These features will allow the creation of the data 
structure in IFC format, as required by the validator. 
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ABSTRACT: Construction and operating costs of residential buildings are important. Because, it can help 
designers, builders, owners, and renters make informed decisions about where and what to buy or rent. One of the 
most significant operating costs of residences is energy cost. More specifically, heating, ventilation, and air 
conditioning account for as much as 35% of the overall energy consumption of buildings in the world. Thus, the 
problem that this research paper addresses is the decision trade-off of construction costs vs. operating costs. 
Therefore, this paper aims to perform a techno-economic analysis of exterior residential wall-type alternatives in 
a warm-humid climate. The research followed a quantitative methodology using a virtual case study with multi- 
objective analysis. The results of this study show the significant importance of the buildings infiltration on the 
operational savings and the return on investment (ROI) of the different types of exterior residential walls. and 
emphasizes the importance of a holistic approach to energy conservation regulations. The novelty of this study is 
the emphasis on the importance of infiltration in pre-construction decision-making. The broader impact of this 
result is that the International Energy Conservation Code (IECC) and similar standards could be revised to reduce 
energy consumption and reduce greenhouse gas emissions produced during energy generation. 


KEYWORDS: Residential, Building Performance, Construction Cost Estimating, Insulation, Infiltration, Return 
on Investment, Decision Support. 


1. INTRODUCTION/OVERVIEW 


Buildings are a major consumer of energy, accounting for approximately 40% of all energy demands for many 
countries (Farhanich & Sattari, 2006; Ogulata et al., 2002; Vine & Kazakevicius, 1999). Likewise in the U.S., the 
building sector accounts for about 40% of all primary energy use, and 76% of electricity use, and is responsible 
for the significant associated greenhouse gas (GHG) emissions. The major areas of energy consumption in 
buildings are heating, ventilation, and air conditioning (HVAC) which account for approximately 35% of total 
building energy. (Department of Energy, 2015) 


Various elements affect the energy consumption of buildings. The most important element is the thermal envelope 
which includes all building components separating conditioned spaces from unconditioned spaces or outside 
ambient conditions and through which heat is transferred (ECC, 2015). The thermal envelope assembly can have 
a positive or a detrimental impact on the overall building performance and therefore on the HVAC energy 
consumption. Although the insulative properties of exterior walls and windows are commonly regulated, with a 
minimum R-value or maximum U-value, other parameters are not normally considered. A higher insulative value 
of a wall is not always advantageous, and can also increase the heating or cooling loads, in some cases, despite 
complying with the legislative requirements for each location (D’Agostino et al., 2019). Another significant 
parameter that could hurt the performance of buildings is infiltration. Infiltration is airflow into and out of buildings 
through unintentional leakage in the thermal envelope due to pressure differences induced by wind, indoor-outdoor 
temperature differences, and the operation of ventilation and other building systems (Persily et al., 2019.). Air 
infiltration has a significant influence on the energy performance of buildings and can result in excessive energy 
demand to maintain adequate indoor comfort levels (Ji et al., 2017; Persily et al., 2019.). 


The problem that this research paper addresses is the effect that different exterior wall assemblies can have on the 
operating cost of a building accounting for both the insulative properties and the infiltration level of the building. 
Furthermore, this study also addresses the initial construction costs to provide insight into the economic viability 
of these construction updates. The effect of the exterior wall assembly on the building’s energy use is a complex 
issue that depends on many parameters including climate, building use, building design, and materiality (Kaynakli, 
2012). Improving the thermal performance of the building envelope can remarkably enhance the whole building’s 
energy efficiency (Abanda & Byers, 2016; Huang et al., 2020). There are various ways of improving the building 
envelope. As previously mentioned, thermal insulation is one of the most valuable tools in achieving energy 
conservation in buildings (Ghrab-Morcos, 2005; Kaynakli, 2012; Wang et al., 2007). Furthermore, air infiltration 
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improvements can potentially lead to HVAC energy savings on the order of 26% (Tian et al., 2019). The objective 
of this paper is a techno-economic analysis of an exterior residential wall in a warm-humid climate. The analysis 
focuses on the effects of the insulative properties of the exterior wall (R-value) and the infiltration rate on the 
HVAC energy consumption. Furthermore, this study includes a cost estimation analysis for the initial construction 
costs associated with these assemblies, to quantify the economic feasibility of these proposed construction updates. 


2. METHODOLOGY 
2.1 Overview of the methodology used 


A quantitative research methodology was used in this project using a virtual case study. The virtual case study 
consisted of a residential building with different wall types and infiltration levels. Each of the wall systems was 
modeled and the construction cost as well as the energy consumption/operating were determined. This was 
achieved in three stages: 1- Data Collection, 2- Energy Performance Modeling, and 3- Results Analysis as shown 
in Figure 1. 


ee ae Energy Performance 


Modeling 


I j r i 
INIST : Residential Building Design| Baseline IECC compliant model Energy consumption results 
Ifor Energy and Sustainability 
l Assessment 


Insulation iterations from R-13 Construction cost estimation 


I to R-29 
i U.S. Census Bureau: 


1 Characteristics of New Housing 
| 

i IECC : Code regulations 

| 

1 RS Means: Exterior Wall 

, Assemblies and Construction 

1 Cost estimating data 


Operating cost estimation 


Infiltration iterations from 


Return On Investment(RO}) 
ACH 5 to ACH 1.25 


Figure 1: Methodology Diagram 


2.2 Data collection process 


The data collection was done from multiple sources to design and model the residential building that accurately 
represented the energy consumption standard and building code in the case study area. 


NIST: The general building characteristics were based on the “Prototype Residential Building Design for Energy 
and Sustainability Assessment” which was published by the National Institute of Standards and Technology (NIST) 
(Kneifel, 2012). 


U.S. Census Bureau: The exterior wall framing type appropriate for this study was selected based on the U.S. 
Census Bureau’s database on residential buildings (U.S. Census Bureau, n.d.). As can be seen in Figure 2, for many 
years wood-frame construction has been the predominant type of exterior wall framing for all residential 
construction in the U.S. The latest available data shows that out of the 970 thousand new single-family houses 
built in 2021, 875 thousand were built with a wood-frame method which translates to approximately 90%. For that 
reason, this study focused on wood-frame-type exterior wall assemblies. Furthermore, because of the nature of 
wood-frame type wall assemblies, compared to mass-wall type assemblies, this study did not include thermal 
inertia as a parameter of the performance analysis study. 
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Number of houses (in thousands) 


2008 2010 2012 2014 2016 2018 2020 2022 
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Figure 1: Type of Framing in New Single-Family Houses Completed in the U.S 


IECC: The model for this case study was created to be compliant with the IECC codes for climate zone A2 which 
includes the Bexar County area in Texas, U.S. (See Table 1). To identify the technical characteristics (R-values, 
U-values) that are necessary for the building to be code-compliant, data was obtained by the International Energy 
Conservation Code (IECC). Because the case study of this work is placed in San Antonio, Texas, the IECC values 
for Zone 2 were used to determine a baseline for the energy performance modeling. Specifically, for the baseline 
model, an R-13 wall was implemented in the Energy+ software. Furthermore, the floor R-value was also R-13, 
and the ceiling was R-38. The windows used had a U-factor of 0.40 and a Solar Heat Gain Coefficient (SHGC) of 


0.25. 


Table 1: IECC requirements table for Texas climate zones. 
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20 or 
Zone4 0.32 0.55 0.4 49 ae MI 10/13 10,2ft 10/13 
Zone3 0.32 0.55 0.25 38 cen 8/13 19 5/13 0 5/13 
Zone2 0.4 0.65 0.25 38 13 46 13 0 0 0 


Regarding infiltration rates, the IECC code requires a maximum of 5 Air Changes per Hour (ACH) when tested at 
a pressure of 50 pascals for climate zone 2 (See Table 2). Therefore, for this study, the baseline simulations model 
infiltration rate was set at SACH. 


Table 2: IECC requirements table for air leakage rates. 


Air Leakage Rate Climate Test Pressure 
Zone 
<5 ACH 1-2 50 Pascals 
<3 ACH 3-8 50 Pascals 


RS Means: Data on the specific characteristics of each wall assembly tested in this study and the associated costs 
of these assemblies were collected from the RS means publications, the industry’s leading standard for construction 
practices, and cost estimates (John Wiley & Sons., 2012). To identify the industry standard of exterior wall 
assemblies, the RS means database was used. This database of standard construction practices was used to 
determine the wood-frame assembly design and all the construction costs associated with these wall assemblies. 
It can be seen in Figure 3, how the standard wood-frame exterior wall has many layers in which it can be designed 
and constructed. The wall variations, and their associated thermal properties, were used as simulation iterations of 


this study. 
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Figure 3: RS means, industry-standard exterior wood frame wall assembly (Builder’s C., 2023) 


2.3 Energy performance modeling 


The energy performance modeling was done using the Design Builder software. Design Builder includes Energy 
Plus (Energy Plus, n.d.) which allows us to quantify the energy consumption and therefore the energy savings of 
the building. EnergyPlus is funded by the U.S. Department of Energy’s (DOE) Building Technologies Office 
(BTO), and managed by the National Renewable Energy Laboratory (NREL) 


The building models were designed to be identical with the exception of the construction of their external walls’ 
insulation (from R-13 to R29) and the infiltration level of the building (From ACH 5 to ACG 1.25). The “baseline” 
model was designed to be compliant with the IECC code for the climate zone of Bexar County (A2) which includes 
San Antonio. 


In order to perform the EnergyPlus simulation analysis for this study, first a 3D model of a residential house was 
designed in the DesignBuilder software (Figure 4). The general building design characteristics followed the 
“Prototype Residential Building Design for Energy and Sustainability Assessment” which was published by the 
National Institute of Standards and Technology (NIST) (Kneifel, 2012). The weather data required for the 
EnergyPlus energy simulations were downloaded internally via the DesignBuilder software for the geographical 
area of San Antonio, Bexar County, Texas. 


Figure 4: 3D axonometric view of wood frame building, in Design Builder software. 


3. ANALYSIS AND RESULTS 
3.1 Energy consumption results 


Figure 5 depicts the monthly energy consumption for heating and cooling for the baseline model (IECC compliant) 
of the energy simulation analysis. For the summer months of June, July, and August, energy consumption for 
heating was minimal. This result was expected for the Warm-humid climate of this case study. Following a similar 
pattern, the cooling loads for the colder months (December, January, and February) were also minimal. 


In the warm-humid climate of this case study, it is expected that the energy consumption for heating during the 
summer months of June, July, and August would be minimal. This is because warm-humid climates typically 
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experience high temperatures and high humidity levels during the summer, reducing the need for heating. 
Therefore, the baseline model's energy consumption for heating during these months would be negligible. 


Similarly, the cooling loads for the colder months of December, January, and February were also minimal in this 
case study. This can be attributed to the fact that colder months in warm-humid climates tend to have milder 
temperatures, reducing the need for cooling. As a result, the energy consumption for cooling in the baseline model 
during these months would be minimal. 


These patterns of minimal energy consumption for heating in summer and cooling in winter align with the expected 
behavior in warm-humid climates. It indicates that the simulated baseline model's design, in terms of heating and 
cooling systems, is appropriately responding to climate conditions. 


æ Sensible Cooling kBtu «=Total Cooling kBtu ===» Zone Heating kBtu 


System Loads(kBtu) 


Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec 


Figure 5: Monthly heating and cooling loads for the modeled house 


Figure 6 showcases the effect different exterior wall iterations (R13 to R-29 and ACH 5 to 1.25) have on the 
cooling (shown in red) and heating (shown in green) of the house. As expected, because of the climate of this study, 
the cooling energy demands are overall higher than the heating energy demands, following a very similar ratio for 
all the wall iterations. Furthermore, Infiltration rates show a significant effect on the HVAC energy consumption 
compared to the Insulation levels. 
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Figure 6: HVAC loads per square foot for different R-Value wall iterations 


Figure 7 is a 3-dimensional graph that showcases the relationship between exterior wall R-value, building 
infiltration (ACH), and the overall energy consumption normalized by square foot (BTU/FT?). As expected, a 
higher R-value lowers the energy consumption and a lower infiltration rate lowers the energy consumption. The 
colored strips of the graph represent energy consumption levels. This type of analysis allows us to easily identify 
and comment on the relationship between these 3 parameters, insulation, infiltration, and energy consumption. For 
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example, the most effective level of lowering energy consumption is depicted with a light blue color at the bottom 
of the graph, and it represents a consumption rate of 75-77.5 kKBTU/FT? for the building of our case study. It is 
apparent that a lower infiltration rate allows for this higher efficiency, even with the standard, minimum-compliant 
R-13 insulation. On the opposite spectrum of this graph, the inverse is also true- the building with a high, code- 
compliant infiltration rate is performing relatively poorly, despite the upgraded R-29 insulation. These findings 
quantify and validate the anecdotal rule of thumb of the residential building industry that a good air sealing 
job with marginal insulation is far better than a good insulation job with poor air sealing. 
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Figure 7: The effects of infiltration and insulation on energy consumption. 


3.2 Cost Estimate and Return on Investment Results 


The goal of this study is to provide an example of a valuable and applicable decision-making tool for many 
construction professionals in the pre-construction phase. Therefore, the energy consumption analysis of the 
previous chapter is supported by a cost analysis to evaluate the financial feasibility of these different wall 
assemblies. The cost analysis is presented in three parts. The first part of the cost analysis presents the cost 
estimating data, which includes the material and labor costs for the construction of these wall assemblies. This 
cost-estimating data was collected by the most recently published RS Means database(RS Means, 2023.). The 
second part consists in determining the operating cost analysis, where the energy consumption is converted into a 
monetary amount, based on the current average kWh cost of 0.14c for the area of Texas (electricityplans.com, 
2023). The third and final part of the cost estimation analysis compares the initial construction costs with the 
operating cost savings to calculate the return on investment (ROI) and the payback period for each of these different 
construction updates. 


3.2.1 Construction Cost Estimation 


In order to properly quantify the construction cost of the different wall assemblies of this study, RS Means data 
was used, which includes both the material cost and the labor cost. There are a number of exterior wall construction 
methods that can be used to reach the R-values considered in this study. However, only the most commonly used 
methods of insulation practices were used for this analysis. The two most common types of insulation are Batt 
insulation and rigid board foam insulation. Furthermore, as mentioned before, this study focuses on the most 
common framing type for the U.S. market, the wood-frame wall. Therefore, the geometric limitations of the wood 
frame wall also affected the cost estimation analysis. Specifically, the maximum R-value that can “fit” inside a 
2x4 framing wall using Batt insulation commonly found in the market is R-13 or R-15. That results in an increase 
of stud thickness from 2x4 to 2x6 in order to reach the values of R-25 and R-29. The cost associated with the 
thicker stud wall is taken into account for the cost estimate. Furthermore, a combination of Batt insulation and 
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rigid board insulation is used to reach the specified R-values. This is a common construction practice in the industry. 
The last two rows of Table 3 show the total cost of insulation as well as the cost differential between the enhanced 
insulations (R-17 through R-29) with respect to required insulation per IECC Code (R-13). 


Table 3: Cost Estimating data for exterior wall framing and insulation. 


R-13 R-17 R-21 R-25 R-29 
Framing type 2X4, 2X4 2X4 2X6 2X6 
Batt Insulation 3.5”, R13 3.5”, R13 3.5”, R13 6”, R21 6”, R21 
Exterior Board Insulation - 1”, R4 2”, R8 1”, R4 2”, R8 
Framing type cost (L.F.) 24.5 24.5 24.5 31 31 
Batt Insulation cost (S.F.) 1.24 1.24 1.24 1.48 1.48 
Board Insulation Cost (S.F.) - 1.3 1.61 1.3 1.61 
Total Cost / Linear foot 34.42 44.82 47.3 53.24 55.72 
Total Cost for the house (U.S. $) 4,268.08 5,557.68 5,865.2 6,601.76 6,909.28 


Additional cost 


Baseli 1,289.60 1,597.12 2,333.68 2,641.20 
from IECC code (U.S $) ree 


In the U.S., the common construction method used to reduce air leakage is the use of sealing tape (Building Energy 
Codes Program, 2018). Other methods can be used to reduce air leakage, the method used in this research project 
was Liquid Flash which can be applied in the building envelope (i.e.: bottom and top of the walls, around windows 
and doors, etc.) It was assumed that to reduce the ACH from 5 to 2.5 the Liquid Flash should be applied at the 
bottom and top of the walls and that to reduce the ACH from 2.5 to 1.25 the Liquid Flash should also be applied 
to windows and doors perimeter in addition to the walls. The last row of Table 4 shows the cost differential between 
the enhanced leakage (ACG 2.5 & 1.25) with respect to the required air leakage requirement per IECC Code (ACH 
5) in the climate zone of the study. 


Table 4: Cost Estimating data for air leakage sealing 


Cost 

Length/Perimeters ACH5 ACH2.5 ACH 1.25 
Length of Wall on the Bottom 178 FT _ 356.00 356.00 
(in contact with Slab) 
Length of Wall on Top (in 
contact with Roof) 178 FT -- 356.00 356.00 
Windows 8 Windows (4’x’4) = 128 FT -- -- 256.00 
Exterior Doors 2 Doors (3 x 6.5’) = 38 FT -- -- 76.00 


Additional cost 


from IECC code (U.S $) ~ 71200 1,04400 


3.2.2 Operating Cost Estimation 


Figure 8 presents the energy saving result differential for each R-value step increase and for each of the 3 
infiltrations (ACH 5, 2.5 & 1.25) categories tested. As expected, a higher R-value leads to more savings and a 
lower ACH level also leads to more energy savings. Furthermore, the operational savings in the case of poor 
infiltration rate (ACH5) are marginal. Specifically, for a greatly updated insulation to R-29, the operational savings 
are only 127 U.S. dollars, yearly, for the entire house of the case study. On the contrary, the improvements in 
infiltration rate, while maintaining the same R-13 insulation on the walls, lead to $ 453 and $ 702 in annual savings 
for ACH 2.5 and ACH 1.25 respectively. 
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Figure 8: Annual Savings in U.S. dollars for the 1076 square foot house. 
3.2.3 Return on Investment and Payback Period 


The Return on Investment (ROI) and payback period analysis of this study showcases the importance of this type 
of methodology as a pre-construction decision tool. Figure 9 shows the ROI in bars and the payback period in 
lines. As shown in Figure 9, improving the R-value from R-13 to R-17 without improving the ACH results in an 
ROI of 4.9% and a payback period of 20.4 years (R-17 blue bar and line in Figure 9), while if the R-value from R- 
13 to R-17 is improved and the ACH is also improved from ACH 5 to ACH 1.25 the ROI is 33.2% and the payback 
period is 3.0 years (R-17 orange bar and line in the Figure 9). The same pattern showing a significantly better ROI 
and Payback period with lower ACH appears in all the other tester wall iterations of this study. The highest ROI 
was estimated to be 67.3% for improving the building’s infiltration from ACH5 to ACH 1.25, without altering the 
insulation (R-13) of the exterior walls. 
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Figure 9: ROI and Payback Period 
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4. INTELLECTUAL MERIT 


The intellectual merit of this work is to provide a greater understanding of the effects of the insulative level and 
the infiltration level on a residential building’s performance. Furthermore, this analysis, combined with the 
construction characteristics and costs of the different wall assemblies can be used as a valuable decision-making 
tool during the pre-construction phase of a residential project. This work compared the energy-saving capability 
of upgraded exterior wall assemblies. Furthermore, the “upgraded” walls were tested under 3 different conditions 
of infiltration of the envelope of the case study. The results showcase the importance of infiltration levels for the 
overall performance of the residential case study. Furthermore, the methodology of this study can be replicated 
and scaled by construction professionals, in order to increase the economic competitiveness of a real project. Last 
but not least, a better understanding of the energy-saving capabilities of a better-built wall and the financial 
incentives presented in this study can promote a future of higher-standard construction methods for a myriad of 
houses across the globe. 


5. SUMMARY AND CONCLUSIONS 


This study highlights the importance of considering the construction and operating costs of residential buildings, 
with a focus on energy costs. Heating, ventilation, and air conditioning (HVAC) contribute significantly to a 
building's energy consumption. The research paper aims to analyze the trade-off between construction costs and 
operating costs by studying different exterior residential wall types in a warm-humid climate. The study uses 
quantitative methodology and a virtual case study to assess the impact of insulation and infiltration on energy 
consumption. The results highlight the importance of building infiltration on operational savings and return on 
investment (ROJ) for different wall types. The study suggests that energy conservation regulations, such as the 
International Energy Conservation Code (IECC), could be revised to reduce energy consumption and greenhouse 
gas emissions. 


In the study's climate, cooling energy demands are generally higher than heating energy demands, with a consistent 
ratio across different wall designs. Additionally, the infiltration rates, or the amount of air leakage, have a more 
significant influence on HVAC energy consumption compared to insulation levels. 


Higher R-values and lower infiltration rates result in lower energy consumption. It is evident that a lower 
infiltration rate contributes to higher efficiency, even with the standard R-13 insulation. On the other hand, a 
building with a high infiltration rate, despite having upgraded R-29 insulation, performs relatively poorly. These 
findings support the industry belief that good air sealing with minimal insulation is superior to good insulation 
with poor air sealing. 


The differential energy savings for each step increase in R-value and for each of the three tested infiltration 
categories. As expected, higher R-values result in greater energy savings, and lower air changes per hour (ACH) 
levels also lead to more savings. It is noted that the operational savings are minimal when dealing with poor 
infiltration rates (ACHS). Specifically, for a significant insulation upgrade to R-29, the yearly operational savings 
for the entire house in the case study amount to only $127. In contrast, improvements in infiltration rates, while 
maintaining the same R-13 insulation on the walls, result in annual savings of $453 for ACH2.5 and $702 for 
ACH1.25. 


The results of the study highlight the significance of utilizing the ROI (Return on Investment) analysis as a 
decision-making tool during the pre-construction phase. The data presented in the study demonstrates the impact 
of different factors on ROI. For instance, with a consistent R-value wall of R-17, the ROI is calculated to be 4.91% 
for a poorly air-sealed example (ACHS), while it significantly increases to 33.13% for an improved air-sealing 
example (ACH1.25). This pattern is observed across all the tested wall iterations in the study. The study also 
identifies the highest ROI, estimated at 186.93%, for enhancing the building's infiltration rate from ACHS5 to 
ACH2.5 without making any changes to the insulation of the exterior walls. These findings emphasize the 
importance of considering air sealing measures in order to maximize the return on investment, as it can have a 
substantial impact on energy savings and overall financial benefits. The study's distinctive contribution lies in its 
exploration of the typically overlooked aspect of infiltration rates in pre-construction building considerations, 
shedding light on the benefits of including these rates for a more holistic analysis of a building's performance. 


Future work to build upon this research includes conducting a comparative study in different climate zones and 
with multiple building types and geometries to analyze the trade-off between construction and operating costs; 
conducting long-term monitoring of building performance to assess the actual performance of different wall 
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designs over time; investigating the impact of occupant behavior on energy consumption; conducting a Life Cycle 
Costing (LCC) to evaluate the economic viability of different wall types; exploring the integration of renewable 
energy systems into residential buildings; focusing on retrofitting existing buildings to improve energy efficiency; 
and conducting a sensitivity analysis of parameters to determine their influence on energy consumption and ROI. 
Additionally, analyzing energy conservation regulations and providing policy recommendations for improving 
energy efficiency in residential buildings. 
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ABSTRACT: The automation and innovation have impacted Architecture, Engineering and Construction Industry, 
particularly when transitioning from traditional or conventional methods of construction to modular or 
Industrialized Building System (IBS). Thus, to ameliorate the processes surrounding built environment, 
researchers have been interested in the BIM 5 integration into construction industry. To ensure BIM’s adoption and 
integration into construction supply chain, supply chains management and procurement, we need to have an 
extensive comprehensive research base regarding global outlook of BIM ss relation with supply chain. The purpose 
of this study is to identify global scientific research patterns and trends related to BIM’s role in supply chain, by 
performing scientometric analysis. The scientometric analysis will help us analyze the work being done in this 
field and whether a significant literature exists that supports or helps in adoption of this idea. Most of the already 
existing research on BIM is performed on various other aspects of BIM like infrastructure sustainability, green 
buildings, design, framework, management of facilities and other BIM related managerial aspects. Thus, it is 
highly imperative to systematize and analyze the existing global scientific literature research to identify the global 
trends and frontiers on current BIM5 relation with construction supply chain. Not only this would pave the way 
towards identification of current relevant literature but would also lay down the foundations for digital 
transformation in construction. 


KEYWORDS: Building information modelling (BIM), construction, digitalisation, procurement, scientometric, 
supply chain management 


1. INTRODUCTION 


The concept of integration of services is not new, over the past several decades this concept has been applied in 
various industries. However, integration in construction supply chain is still lagging as opposed to other sectors. 
There are several reasons behind it, including complexities in construction processes owing to the fragmented 
procurement processes, multiple project stakeholders and several other challenges. The instability and 
fragmentation occur when supply chains are temporarily created or setup for each individual one-off construction 
projects. (Papadonikolaki, et al., 2015). Meanwhile, Building Information Modelling (BIM) is a kind of technology 
that can collect, create, impart, and share accurate information among different stakeholders of construction supply 
chain. BIM is equipped and able to tackle operational, organizational, and other technical complexities in the 
construction supply chain. BIM not only enhances visualisation, design coordination and construction sequencing 
but also aids in construction processes through its various built-in features. These features include clash detection 
visualisation, scheduling and controlling capabilities. (Rathnasinghe & Kulatunga, 2019). 


However, BIM's impact on challenges, faced by construction supply chain (CSC) at organisational level, is not 
thoroughly researched. The integration of BIM and CSC are still quite theoretical and conceptual, lagging in 
substantiated research. Only after extensive research and observing BIM's impact on the construction supply chain 
and its management, we can be fully assured of its contributions to the CSC. These gaps and voids in research thus 
need to be further explored. Thus, the following literature review will focus on the BIM's role and contributions 
so far to the CSC and its management. The literature review will also shed light on the underlying problems and 
background in the supply chain processes (Le et al., 2022). The research proposal has immense significance in 
terms of contributing to the theoretical knowledge related to the problem statement. The author believes that the 
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need for this proposal is highly imperative as it further compliments the existing knowledge in context of BIM's 
multi-dimensional relations to construction supply chain at large and procurement. 


The idea of construction supply chain has been around for a while now but still the constraints and challenges exist 
in construction supply chain owing to its uniqueness in each individual project. That means, every project being 
delivered is different in some ways from other projects rendering it difficult for integration to happen. This further 
tells that the construction supply chain is not strongly tightened interrelated process but a loose system which 
despite being important to global economy, is somewhat inefficient and untrustworthy. Thus, the research gap 
exists when it comes to addressing the issues (Papadonikolaki, et al., 2015). Moreover, if we narrow down the 
supply chain of Architecture, Engineering and Construction industry to mere procurement, again we'll find many 
unanswered problems. Integrating BIM with project delivery contract methods should ideally give rise to some 
new contract types specific to BIM usage but that hasn't been done yet and is a largely unexplored area. 


2. LITERATURE REVIEW 


The late twentieth century saw an evolution in logistics and supply chain management. This evolution had a direct 
impact and affected the construction supply chain and its management, due to its importance and magnitude. Since 
then, several problems and shortcomings in the construction supply chain management have been identified by the 
researchers involved in this field. Many problems were identified such as incoherence, lack of integration and 
inefficiency of the procurement or supply chain process (Khalfan et. al., 2015). A report on commercial 
construction industry highlighted that the mega projects take almost 20 percent more time to get completed than 
their scheduled completion date (McKinsey & Company, 2015). The report proclaimed that the construction 
industry still lagged in adopting innovative technologies when it came to information sharing about projects. 


BIM can play a pivotal role in data collection, integration, and provision. The goal is to obtain or gather data which 
is accurate enough that it can be channeled to other projects to aid or assist in the building processes including 
Construction Supply Chain. This includes the gathered data that can link the mega scale projects and can be 
modelled using Building Information Modelling systems (BIM), and then exploring further ways to link these 
information and models to the construction supply chain (Wang et. al, 2017). To help everyone involved to have a 
better understanding and clarity of the overall project, the data that is already gathered in the BIM system can help 
in driving labour and material requirements hence assisting in a construction supply chain management (Wang et. 
al, 2017). 


The approaches related to BIM and supply chain management revolve around supply chain integration with BIM 
to enhance and improve construction processes. Hence, it is believed that BIM can act as a catalyst for supply 
chain management adoption in construction (Wang et. al, 2017). The supply chain's integration is linked with both 
stakeholders and processes involved who are expected to coordinate and collaborate across various SC levels with 
long lasting trustworthy relationship. The researchers observed that supply chain stakeholders early risk 
management/allocation, involvement, participation, information technology investment and long-term 
procurement could further strengthen SC integration. (Getuli, et al., 2016). BIM can also help to enhance 
performances of mechanical, engineering, and plumbing aspects of the project. This can be done only with the 
stakeholder’s cooperation, that means, early joint planning, joint decision making and operations. Construction 
supply chain partnership can be helped by Building Information Modelling through data-information sharing, trust 
building and enduring long-term commitment (Le et al., 2022). 


During the early days of BIM development and while it was undergoing further technological advancement, one 
of the associated areas of interest was supply chain procurement and legal aspect of BIM. During early 21st century 
the BIM advocates observed that the two of the major hinderances in potential data sharing through BIM's 
platform; are legal constraints and varying frameworks of contracts. These two were seen to be obstacles in putting 
BIM in practice. Some of the key issues that surfaced include roles and responsibilities of individual stake holders 
being affected by use of BIM, liability and copyright issues associated with the BIM models, sharing of BIM’s 
data and model ownership, and stakeholders focusing more on their individual components of the project rather 
than giving due consideration to the bigger picture of process (Holzer, 2015). 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


3. METHODOLOGY 


This study aims to identify research patterns and trends about the global BIM research and how it is related to 
construction supply chain, construction supply chain management, and procurement in AEC industry. The research 
question focusses on how BIM is related to or being utilized for construction supply chain, supply chain 
management at large and procurement. The research method revolves around devising a framework of research 
design that defines and outlines the criteria for scientific databases, rules of search, defining data curtain, retrieving, 
processing, and analyzing of preliminary and final datasets. Therefore, this study employs the scientometric 
method, that is the sub-field of bibliometric analysis that is concerned with analysis and measuring of scientific 
literature. Bibliometric analysis can be defined as quantitative and statistical analysis of research data like journal 
publications, articles, and their patterns to identify the impact of those publications as well as to identify further 
research trends or patterns (Iftikhar et. al., 2019). The concept of bibliometric review was said to be introduced by 
Pritchard in 1969, who argued that this form of analysis had the potential to provide comprehensive insights into 
the research literature. (Pritchard, 1969). Fig. 1 outlines what would the scientometric analysis and mapping 
process would entail at each stage of the implementation. It illustrates what would be included at each step of this 
research project. 


Step 1: Research Design 
* Scientometric Analysis using citation data 


* Combination of analysis and visualization of 
networks. 


Step 2: Scientometric data curtain 
- Search keywords based 
' Relevant journal selection 


- Scopus and Web of Science databases 
-Inclusion and exclusion criteria are defined 


Step 3: Software and methodology 

-VOS viewer and Biblioshiny software used 

> -Co-occurrence, co-authorship, co-citation 
and other approaches are employed. 


— 


Step 4: Analysis and results 
- Identifying impactful publications, most cited articles — 
- Influential authors, emerging trends 
- Revealing intellectual structure using factor analysis 


Step 5: Discussions and Interpretations 
- Future research 
- Novel research findings 
-Research questions answered and objectives achieved. 


Fig. 1: Scientometric analysis and research mapping process 


4. RESULTS AND DISCUSSION 


Scientometric analysis was analyzed using several software as discussed and explained in research methods. The 
first analysis will revolve around outlining analytics related to Co-Authorship (type of analysis) and Authors (unit 
of analysis). The visualization results of 176 scientific documents retrieved from ‘Web of Science’ database, which 
were fed to VOS viewer. Hence, a visualization of co-authorship network was developed to analyze authors that 
had accounted for scientific research related to Building Information Modelling’s role in procurement of 
construction projects, amelioration of construction supply chain and playing part in supply chain’s management. 
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4.1 Visualisation Using VOSviewer 


The VOS viewer was set to ignore publications which had large number of authors (maximum level set at 25 
authors). The author’s threshold with minimum number of documents was set at 2, whereas the minimum number 
of citations of an author were set to 1 citation. After applying the analytical functions, out of 464 authors, 46 
authors met the threshold. These 46 authors had accounted for the most documents and citations in this respective 
research area. The table below outlines the data regarding these authors. The analysis reflects the most productive 
authors who had made significant research contributions to Building Information Modelling’s role in AEC industry, 
especially, the areas related to supply chain management and procurement. As shown in Table 1, the 3 most 
productive researchers were found to be Chong, Heap-yih (7 documents,146 citations), Wang, Xiangyu (6 
documents,142 citations) and Love, Peter (4 documents,124 citations). This was followed by authors Grilo, 
Antonio (3 documents, 95 citations), Jardim-goncalves, Ricardo (3 documents,95 citations) and Lee, cen-ying (3 
documents, 94 documents). 


Table 1: Publications and citations per author 


No Author Documents Citations 
1 chong, heap-yih 7 146 
2 wang, xiangyu 6 142 
3 _ love, peter e. d. 4 124 
4 grilo, antonio 3 95 
5 _jardim-goncalves, ricardo 3 95 
6 lee, cen-ying 3 94 
7__ sing, chun-pong 2 75 
8 _ matthews, jane 2 66 
9 hosseini, m. reza 3 46 

10 eadie, robert 2 33 
11 _edirisinghe, ruwini 2 32 
12 _skibniewski, miroslaw j. 2 31 
13 mahdjoubi, lamine 2 27 
14 rowlinson, steve 2 27 
15 holzer, dominik 2 26 
16 vass, susanna 2 21 
17 _ zhou, jingyang 2 21 
18 cheng, jack c. p. 3 19 
19 das, moumita 2 19 
20 ciribini, angelo 1. c. 2 15 
21 edwards, david john 2 15 
22 _mahamadu, abdul-majeed 2 15 
23 _scaysbrook, stephen 2 15 
24 meng, xianhai 2 14 
25 _joseph-akwara, esther 2 12 
26 law, kincho h. 2 12 
27 ___ gaterell, mark 2 10 
28 lee, cen ying 2 8 
29 _ tezel, algan 3 7 
30 _jin, ruoyu 2 7 
31 _li, haijiang 3 6 
32 lindblad, hannes 3 6 
33 ren, guogian 3 6 
34 _abrishami, sepehr 2 6 
35 abu-samra, soliman 2 6 
36 chaabane, amin 2 6 
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37 _phuoc luong le 


38 _thien-my dao 


39 _mejlaender-larsen, oystein 
40 fai, s. 


41 sacks, rafael 


42 _ariffin, hamizah liyana tajul 


43 mustaffa, nur emma 


44 papadonikolaki, eleni 
45 he, dandan 
46 li, zhongfu 
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4.2 Network Visualisation 


The network visualisation is shown in Figure 2 Network Visualization Co-authorship, that illustrates the total link 
strength of 60 with 32 links and 25 clusters. The co-authorship link strength refers to the collaboration strength of 
the authors. The nodes represent the author whereas thickness or size of nodes represent the documents. The edges 
or links refer to co-authorship collaboration between the authors, the thicker the lines, the stronger the 
collaborations. As mentioned previously, there are a total of 25 clusters, but we will analyse only top 3 clusters as 
they are indicative of strong collaborative research authorship between them. Fig. 2 shows the biggest cluster, 
cluster 1, is denoted by red colour and comprises of 5 top productive others; Chong, heap-yih being the most 
productive of the lot with total link strength of 12, 7 documents and 4 links (1 link with each other author in the 
cluster 1). Following Chong is Wang, Xiangyu with 3 links with Chong, Lee and Cin-Yeng. Wang had co-authored 
total of 6 documents with a total link strength of 10. 


A VOSviewer 


Fig. 2: Network visualization co-authorship 
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The cluster 2 consisted of 4 authors, 2 of these authors, Abrishami. and Abu-samra had strong collaboration links 
between them in terms of co-authorship of publications as both had published 2 documents but their total link 
strength was 5 each and had links with each other as well as 2 other authors in the cluster. The other author Husseini 
had the same total link strength of 5, however, he had authored 3 publications only with each of the fellow authors 
in its cluster. However, in blue colored cluster 3, author Love, Peter’s node and edges were indicative of very 
strong collaborative effort with rest of the authors in the cluster. Love had total of 4 documents with total link 
strength of 6 and 4 documents thus depicting significant co-authorship whereas rest of the 3 authors namely Sing, 
Chun-pong, Matthews, Jane and Zhou, Jingyang had each collaborated 2 documents and had link strength of 4. 


4.3 Co-occurrence keyword analysis 


Over the past couple of years there has been increased involvement and information about Building Information 
Modelling in every aspect of Built environment and AEC Industry at large. This has led to evolvement of numerous 
themes and topics in research related to BIM’s involvement in the construction industry. In this section we’ll be 
discussing analytics surrounding co-occurring keywords in our results’ dataset. Keywords play an important role 
by serving as reference point, aiding the contents’ description and conceptual understanding in research literature. 
(Akinlolu, et al., 2020). Hence to perform co-occurring analysis of keywords, data from Web of Science was 
imported into VOS viewer. After feeding the said data into VOS viewer, the threshold for minimum occurrence of 
a keyword was set to 5 keywords. Hence, out of 781 keywords, 34 met the threshold. Table 2 tabulated the most 
recurring or co-occurring keywords in decreasing order of occurrence. 


Table 2: Keyword occurrences 


No Keyword Occurences Total link strength 
l1 bim 57 150 
2__ building information modelling 33 100 
3 management 29 108 
4 _ procurement 28 80 
5 construction 24 66 
6 performance 23 89 
7 _ design 22 86 
8 framework 22 90 
9 implementation 21 94 

10 model 21 84 
11 _ collaboration 17 53 
12 _ building information modelling 15 29 
13 building information modelling (bim) 15 36 
14 __ building information modelling (bim) 15 37 
15 information 15 51 
16 innovation 13 65 
17 _ project management 13 52 
18 system 13 53 
19 projects 12 57 
20 industry 10 57 
21 _ technology 9 42 
22 sustainability 8 27 
23 adoption 7 34 
24 construction industry 7 27 
25 _ construction supply chain 7 14 
26 information modelling bim 7 39 
27 _ construction projects 6 27 
28 interoperability 6 20 
29 _ systems 6 25 
30 infrastructure 5 14 
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31 _ipd 5 14 
32 _lean construction 5 19 
33 prefabrication 5 18 
34 simulation 5 27 


The degree of co-occurrence is determined by similarity of keywords as well as their proximity to one another. 
The 34 top productive and repetitive keywords from 176 research publications produced total of 4 clusters as 
shown in the figure below. The biggest cluster of all was cluster 1 with 13 keywords and is denoted by red color. 
The intertwining of links and proximity of nodes reinforces the point that the various aspects of construction 
industry and built environment are directly proportional and related to Building Information Modelling. 


4.3.1 Cluster 1 


The first and strongest cluster had various keywords with strong and highly imperative correlation and literature’s 
scientific structure thus helping in anticipating trends and establishing firm research base for future. As shown in 
Fig. 3, the first cluster had several keywords like building information model/modelling, project management, 
construction industry, lean construction, prefabrication, sustainability, and interoperability. 
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Fig. 3: Keyword co-occurrence network visualization 


This cluster’s keywords had most relatedness thus depicting that these areas of research in relation to BIM were 
most comprehensive. We’ll also look at the overlay visualization of these clusters as well in the figure below. The 
cluster 1 had research conducted on its keywords from 2015 to 2017.38 average year. The keywords like ‘lean 
construction’ and ‘prefabrication’ were the latest with average year 2017 moreover both keywords had 5 
occurrences each. This is indicative of the fact that although BIM’s involvement with sustainability and industry 
is couple of years old, but still witch each passing year we are seeing innovative concepts being researched about 
in BIM’s context. 
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4.3.2 Cluster 2 


This cluster had some interesting terminologies like technology, innovation, implementation, and adoption along 
with the other repetitive keywords like “BIM”, “construction” etc. As opposed to the cluster 1, this cluster had 
more recent research trends averaging between year 2017 to 2018. The keywords like “innovation” had strong 
links with “BIM adoption” and “construction collaboration”. Thus, it is pertinent to note that there has been 
increased inclusion of innovation and adoption in hot areas of global subject literature around BIM and the strong 
edges in “overlay visualization” are indicative of increased research trends. 


4.3.3 Cluster 3 


The third cluster as shown in Fig. 4 revolved around keywords like “collaboration” (17 occurrences, 22 links and 
avg. pub. Year 2016.4), “construction projects” (6 occurrences, 16 links, avg. pub. year 2017.5), “design” (22 
occurrences, 27 links, average year of publication 2016.80), “framework” (22 occurrences, 28 links and average 
year 2017), “procurement” (28 occurrences, 26 links, average pub. year 2016.19) whereas ‘”IPD or integrated 
project delivery” and “simulation” had 5 occurrences each with 14 and 19 links, and avg. pub. years. 2015.25 
and 2016.5 respectively. Keywords like “framework”, “procurement” and “simulation” were seen to be the most 
recent ones in this cluster, however, one common thing among all clusters is the strong link strength and 
proximity with the words ‘BIM’. Based on centrality of nodes and strength of links, an inference could be made 
that these keywords play an important role in diversifying the BIM’s research literature. 
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Fig. 4: Keyword cluster overlay visualization 
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4.3.4 Cluster 4 


The last cluster had total of 6 keywords that included BIM, building information modelling, construction supply 
chain, infrastructure, performance, and projects. The green and yellow nodes (representing year 2017 and 2018) 
of these keywords reflect the latest research developments and collaboration in these respective fields. If we look 
at the item density visualization in Figure 5, item density visualization it tells us about the keywords density at a 
particular point and is represented by a color. The colors vary from blue to green to yellow. The closer the proximity 
of the keyword to the large number of other keywords and the larger the weight of those items, the more the color 
of that point is yellow. Hence, if we analyze the density visualization, it is self-evident that keyword “BIM” has 
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bright yellow color and has proximity and relatedness with keywords like “performance”, “procurement”, 
“management”, “design”, “collaboration” and “construction”. Since the word “BIM or Building information 
modelling” has been repetitive in each cluster thus not all same keywords have same density. For example, item 
density visualization in Figure. 5 illustrates “construction supply chain” in proximity with ‘building information 


modelling’ in top left corner thus reiterating the connection between the two. 


performance 


Fig. 5: Item density visualization 
4.4 Collaboration network of countries 


Consequently, the dataset produced comprehensive scientific mapping for us to analyze. The collaboration network 
of countries was pulled out and examined, as shown in Figure 7: Collaboration Network of Countries. The figure 
illustrates the biggest collaboration node being represented by UK having the strongest cross collaboration link 
with Australia. UK was seen to have a collaboration network with Australia, Luxembourg, Canada, South Africa, 
Ireland, China, Hong Kong, and Ireland. Whereas, after removing the isolated nodes several collaboration trios 
were identified. These trios included USA, China, and Israel. Moreover, to have a clear picture of social structure, 
a country collaboration map was prepared as shown in Fig. 6, that showed intercontinental collaboration on BIM 
usage in supply chain and construction industry. 
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Fig. 6: Collaboration network of countries 
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Fig. 7: Countries of collaboration 
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Fig. 7 shows there are three shades of blue clearly seen in the map. The darkest shade of blue seen between China, 
Australia, UK, and USA represent most productive contribution, whereas the thicker pink linkages depict the 
strength of collaboration. The lighter shade of blue comes second in production of scientific literature on BIM, 
construction supply chain and procurement. 


4.5 Conclusions 


The Web of Science database helped discover 176 relevant publications which were imported into VOS viewer 
and as per research objectives the co-authorship analysis were performed initially. The co-occurrence authorship 
analysis depicted various patterns. The strongest partnership cluster between authors comprised of 5 authors 
denoted by red color and had thickest edges thus had biggest impact. We can see the biggest cluster of 5 authors 
although had the strongest links and collaboration but still the average publication years were between 2014-2015. 
This explains that the only strong collaborations on BIM’s relation with construction supply chain were done 
couple of years ago and are not recent. Whereas the yellow-colored clusters which are not only small in link 
strength and only consist of 2 or 3 authors, but also have less collaboration and occur isolated in the cluster, are 
the most recent ones. There is no robust or exponential growth in scientific research literature when it comes to 
BIM adoption into supply chain management, logistic and/or procurement. Thus, the future BIM research must be 
driven and molded towards these areas. 


The second scientometric analysis was Co-occurrence of keyword analysis, which consisted of 4 keyword clusters, 
with Cluster 1 being biggest containing 13 keywords, followed by 8,7 and 6 keywords respectively. It was pertinent 
to observe that the word “Construction Supply Chain” only appeared once ina cluster, with 7 occurrences in cluster 
4, with average publication year of 2017. The most recent or latest keyword cluster was seen to be ‘adoption’ 
(denoted by yellow) with strong links to other keywords like “technology”, “information”, “infrastructure”, 
“management”, “design”, “framework” etc. Thus, a clear pattern can be observed here that the already existing 
research regarding BIM’s adoption into infrastructure management and other fields is being explored but at the 
same time key area like supply chain is somewhat lagging. On the other hand, 280 publications from Scopus were 
exported to Biblioshiny for scientometric mapping purposes and collaboration network between the countries was 
analyzed. This was done to examine the global research patterns which would further help in identifying the parts 
of the world where lack of collaboration is significant. Hence paving the way for future researchers, belonging 
from those countries, to comprehensively explore into otherwise ignored BIM’s associated aspects, particularly in 
construction supply chain. 


The innovative contribution of this study is pertinent with the integration of BIM and CSC are still quite theoretical 
and conceptual, lagging in substantiated research. Only after extensive research and observing BIM’s impact on 
the construction supply chain and its management, its contributions are clear to the CSC. Study gaps in previous 
research highlighted BIM’s role on multi-dimensional aspects to supply chain. This study is highly imperative as 
it compliments the existing knowledge in context of BIM’s at large and particularly in relation to construction 
procurement. Current challenge includes the creation of a consistent information flow that hasn’t addressed BIM 
cooperation process among all construction stakeholders. Therefore, this study contributes towards addressing 
these practical issues by outlining a framework for the integration of BIM and CSC. An example of successful 
application of the relationship is when BIM acts as an information integrator while CSC is a secure environment 
for collaboration in real-world construction project. The BIM-based CSC multi-model integration framework is 
therefore crucial in identifying, analyzing, and making full use of the organisational, operational, and technical 
complexity. Additional real-world cases can be used to calibrate the model in the future to establish successful 
application of the relationship between BIM and the construction supply chain. 
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ABSTRACT: A clear benefit of e-procurement technology in the construction sector is its capacity for in reducing 
waste and costs. Despite its successful adoption in other major industries, the take-up of e-procurement in the 
construction industry has generally been slow. A variety of barriers to adoption have been identified in literature, 
predominantly at an international level. Whilst the benefits of e-procurement are well-known and the challenges 
and barriers to the adoption of these systems in the construction industry has been well-documented, research into 
actual outcomes following the adoption of e-procurement systems through a case study analysis is limited. Tracking 
and measurement of adoption rates in the Australian construction industry is particularly scarce. To build upon 
and add to the existing body of contemporary literature, this study seeks to examine the adoption of e-procurement 
technologies in the Australian construction industry. As the Australian construction sector enters a period of high 
inflation, technologies such as e-procurement have a critical role in mitigating these price escalations. 
Understanding the barriers and opportunities for wider adoption of e-procurement in the Australian construction 
industry is also a clear benefit with its capacity for digital transformation in the construction sector. 


KEYWORDS: E-procurement, technologies, digital transformation, construction, Australia. 


1. INTRODUCTION 


The construction industry is one of the largest sectors in the Australian economy, employing some 1.03 million 
workers across 395,000 individual businesses (IBISWorld, 2021). The industry accounts for 9% of Australian 
Gross Domestic Product (GDP) with the total value of construction work done over the 12-months to December 
2021 in the order of $214 billion (ABS, 2022). Procurement management is a well-established technique utilised 
by firms to drive sustainable competitive advantages during periods of economic turbulence (Hong and Kwon, 
2012). Since emerging more than two decades ago, e-procurement technologies have increased significantly across 
many sectors (e.g., manufacturing, wholesaling) as both government and private firms have realised the benefits 
of e-procurement platforms, including, inter alia, increased transparency and accountability, improved 
sustainability, and cost savings (Deraman et al., 2019). The evolution of e-procurement platforms has evolved 
greatly over this period, growing from simple electronic-based systems to fully integrated, web-based platforms. 
Despites its successful adoption in other major industries, the take-up of e-procurement in the construction industry 
has generally been slow and low (Afolabi et al., 2019). 


A clear benefit of e-procurement technology in the construction sector is its capacity to reduce waste and costs. 
For instance, e-procurement enables building and construction firms to more accurately review and cost projects 
at the procurement stage and enables more efficient management processes on-site. Furthermore, e-procurement 
enables clients to achieve more competitive pricing through contacting more potential suppliers without increasing 
overheads. Technologies such as e- procurement systems have a critical role in mitigating price escalations and 
while the challenges to the adoption of these systems in the construction industry has been well-documented, 
research into actual outcomes following the adoption of e-procurement systems is limited. Tracking adoption rates 
in the Australian construction industry is particularly scarce. To build upon and add to the existing body of 
contemporary literature, this study applied the science mapping approach into the area of e-procurement in 
construction. It aims to provide insight into the existing challenges, benefits, and adoptions rates of e-procurement 
in the Australian construction industry and seeks to provide recommendations for future adoption. 


2. LITERATURE REVIEW 


E-procurement is the use of the Internet to support the delivery of procurement tasks. More specifically, it is an 
aspect of e-Commerce that incorporates web-based applications and communication technologies to carry out 
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procurement activities such as sending and receiving tender information, submission of tenders, acquisition of 
materials, equipment and services, and payment of goods and services (Ibem and Laryea, 2015). A study conducted 
in the United Kingdom by Eadie et al., (2010) sort to identify the leading barriers and benefits in e-procurement. 
A survey of 775 construction organizations was conducted and revealed that the leading benefits were ‘Process, 
transaction and administration cost savings’, “Convenience of archiving completed work’, and ‘Increased quality 
through increased accuracy’. These benefits support the findings of Brandon-Jones (2017) who suggests that the 
use of e-procurement can deliver significant operational benefits, including improved delivery accuracy, reduced 
transaction costs and greater control over organization procurement. Furthermore, Yevu and Yu (2019) found that 
drivers of e-procurement can be broken into seven categories: external drivers, project-level drivers, technological 
and process-level drivers, company-level drivers, individual-level drivers, service satisfaction drivers and 
sustainability concept drivers. Interestingly, they note that modern construction concepts such as sustainability and 
client satisfaction are influencing the adoption of e-procurement. Conversely, Eadie et al., (2010) identified the 
dominate barriers to e-procurement as ‘Prevention of tampering with documents’, ‘Confidentiality of information’, 
and ‘Resistance to change’. Another study by Yevu et al., (2021b) categorized barriers into six groups: 
technological usability and evolution, security and unsupportive environment, culture, infrastructure, unethical 
practices, and financial and skills related. 


A wide range of international studies has been undertaken, (Zunk et al., 2014; Ibem and Laryea, 2015; Afolabi et 
al., 2017, 2019; Tran et al., 2021) and found that whilst e-procurement is not a new concept in the varying industrial 
sectors, the construction industry has been slow in adoption compared to other industries such as manufacturing 
and retail business (Ibem and Laryea, 2015). These studies indicated the varying barriers and drivers e- 
procurement has on developing and developed economies. It has been discovered that the high cost and low access 
of Internet services in developing countries combined with lack of industry experience and training has had an 
adverse effect on initial uptake of e-procurement (Ibem et at., 2021; Tran et al., 2021). A common barrier found 
both in developing and developed countries was the lack of expertise and promotion of e-procurement (Yevu et 
al., 2021b; Zunk et al., 2014). Zunk et al., (2014) reports that some construction firms in Austria didn’t know what 
e-procurement was let alone the benefits. Afolabi et al., (2017) states that the benefits of e-procurement platforms 
should not be overlooked. They note it is a viable tool for increasing productivity and empowering construction 
professionals to exercise greater control of the construction process. Aghimien et al., (2021) added that 
digitalisation offers solutions to consistent challenges of delivering projects over budget, beyond the expected 
timeframes and not to specification. It is evident that a large volume of international research exists relating to e- 
procurement within the construction sector. 


Enterprise Resource Planning (ERP) systems have become commonplace applications in significant sectors. 
According to industry rankings and turnover, the top suppliers include SAP, ORACLE, Microsoft. ABAS, IFS and 
Step Ahead. Implementing BIM enables better project management, process efficiency, increased transparency, 
cost control, and real-time communication, just like with ERP systems. Additionally, the system may retain all 
technical information, drawings, and construction methods, and users can simultaneously work on different project 
phases throughout the course of the project's lifespan. BIM may be used to manage the technical elements of a 
building project as well as help with strategic procurement choices like choosing a contractor. There are many 
advantages of adopting ERP systems in the construction industry including automating procedures in client 
assistance, project management, cost predictions, employee management and procurement management through 
operational automation. Project management needs to be optimized since it is vital to the success of any 
construction company. Without good project management, the company would lose Clients and money. With all 
operational activities are automated by the ERP system, project management supervision is improved. ERP is a 
useful tool for cost estimation since it considers all important cost aspects, including materials, design, contracts, 
and transportation. Budgets for specific cost centres can be estimated and allocated to include overhead liabilities 
and even potential delays. 


In the construction sector, successful project execution depends on efficient communication, in which ERP systems 
are enhancing communication. One issue that construction businesses frequently struggle with is maintaining 
strong departmental communication. Project schedules may be impacted by departmental disconnections that slow 
down operations and business processors. Employees may rapidly tell executives and management on projects on 
their mobile devices thanks to mobile features. It is possible to handle external communication and updates using 
stakeholder and customer relationship management software that is integrated with the ERP system. Another key 
benefit of ERP in construction is it enables remote access to all pertinent files and data. ERP systems assist in the 
efficient and speedy centralisation of huge amounts of data. Cloud applications can be used by the latest technology, 
which eliminates the need for big, expensive servers. ERP and BIM technologies provide more efficient project 
management and improved cost accuracy. All project data may be kept in a single repository, and numerous users 
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can access it at once. Computing aided design applications can be integrated with BIM as a solution to improve 
efficiency in procurement in the construction industry. 


3. METHODOLOGY 


This review-based study applied the science mapping approach into the area of e-procurement in construction. It 
aims to provide insight into the existing challenges, benefits, and adoptions rates of e-procurement in the Australian 
construction industry and seeks to provide recommendations for future adoption. To achieve these objectives, a 
three-stage review process will be adopted. 


3.1 Bibliometric search 


The initial step of the review was the preliminary literature search using academic research database, Scopus. 
Scopus database has been used as the main source of information as it is considered a reliable source of scientific 
publications by academics (Baas et al., 2020). A comprehensive search was undertaken using a search string of 
keywords consisting of “e- procurement” or “procurement” or “sustainable procurement” or “digital procurement” 
and “construction” or “building”. Initially, 654 publications were found. These publications were further screened 
by only including publications dated between 2012 — 2022, journal articles exclusively and in English. This 
screening reduced the available literature samples to 492. Further screening of the remaining articles was 
conducted through the review of publication titles, abstracts, and keywords. Publications that were not closely 
related to this study were removed. This exercise highlighted that Automation in Construction, Construction 
Innovation and International Journal of Procurement Management had at least three papers each. A total of 82 
papers from 45 journals were selected as the literature sample for the scientometric analysis. 


3.2 Scientometric analysis 


The second step of the review involved a scientometric analysis method by adopting the bibliometric mapping 
software VOSviewer (Van Eck and Waltman, 2010). Scientometrics can be described as the quantitative approach 
applied in text mining of scientific publications (Hawkins, cited in Aghimien et al., 2021). Scientometrics are 
useful in facilitating a visual perspective of structural and dynamic aspects of scientific research and analysis 
outlined within existing literature (Olawumi and Chan, cited in Aghimien et al., 2021). Thus, it has allowed 
researchers to discover existing systematic literature-related findings by connecting literature theories that may 
have been missed in manual review studies. VOSviewer generates, visualises, and analyses bibliometric networks 
(Van Eck and Waltman, 2010). Specifically, its text mining capabilities can construct network maps of journal 
sources, co-citations, co-authorship, country of origin and co-occurring keywords sourced from abstracts and 
bodies of research articles (Van Eck and Waltman, 2011). The literature sample sourced from Step 1 was imported 
into VOSviewer to create a network of co-occurrence keywords, along with lead journal authors and sources, and 
country of origin. The co-occurrence network assisted in identifying the primary area of interest of e-procurement. 


4. RESULTS 


A total of 30 countries was identified from the literature sample. Australia has the highest number of publications 
(15), with 82 citations. This is followed by United Kingdom with 9 publications and 86 citations, and China with 
8 publications and 42 citations. Countries which closely followed included Malaysia, Hong Kong, and Nigeria. 
Interestingly, 14 out of the 30 countries only published one article between 2012 and 2022. The potential for further 
research within these countries to gain a greater understanding of e-procurement in construction could be beneficial 
for future researchers. 


4.1 Publications per author 


An authorship network map is used to identify the influential researchers in the e-procurement sector of 
construction research (Marzouk et al., 2022). A minimum of two published articles and five citations was set as 
the criteria of the authors. Tunji-Olayeni P., Yevu S.K., and Yu A.T.W. are the most productive scholars in this 
research domain based on the number of published articles. Additionally, Eadie, R., Perera, S., and Heaney, G., are 
in the same cluster, indicating their mutual relationship by citing one another’s work. The distance and connection 
lines between clusters can also be used to determine the authors linkage strength (Van Eck and Waltman, 2014). 
The quantitative measurements of the most prominent authors are explored in Table 1. The affiliation column 
shows the author’s institution at the time of publication and reveals that Ibem E.O. has the highest number of 
citations 79 for their three extracted documents. However, Yevu S.K., Yu A.T.W., and Tunji-Olayeni P. have the 
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highest number of publications of four extracted documents with 14, 14 and 53 citations, respectively. 


Table 1: Number of publications per author 


Author Affiliation Nos Citations 


Tunji-Olayeni P. Department of Building Technology, Covenant University, Nigeria 4 33 


Department of Building and Real Estate, The Hong Kong Polytechnic 
Yevu S.K. University, Kowloon, Hong Kong 4 14 


Department of Building and Real Estate, The Hong Kong Polytechnic 
YuA.T.W. University, Kowloon, Hong Kong 4 14 


Ibem E.O. Department of Architecture, Covenant University, Nigeria 3 79 


School of Construction Economics and Management, University of 


Layryea S. Witwatersrand, South Africa 2 60 


UNIDEMI, Faculdade de Ciéncias e Tecnologia da, Universidade Nova 


Grilo A. de Lisboa, Monte de Caparica, Portugal 2 58 
Costa A.A. CIST/Instituto Superior Técnico, University of Lisbon, Portugal 2 37 
Tavares L.V. CESUR/Instituto Superior Técnico, University of Lisbon, Portugal 2 37 


4.2 Publications per source 


Publications within the literature sample originated from 49 sources. Table 2 depicts five sources with at least three 
publications focusing on e-procurement in construction. Engineering, Construction and Architectural Management 
and Construction Innovation have the highest number of extractions with four articles each and interestingly, 14 
citations each. The most cited source is Automation in Construction, with three articles and 42 citations. 


Table 2: Number of publications per source 


Source Nos Citations 
Engineering, Construction and Architectural Management 4 14 
Construction Innovation 4 14 
Automation in Construction 3 42 
International Journal of Procurement Management 3 34 
International Journal of Construction Management 3 22 
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4.3 Pattern of keywords 


Through analysing keyword co-occurrences, knowledge advancements can be mapped to assist in understanding 
the knowledge structure of study (Su and Lee, 2010). To formulate such map, VOSviewer’s co-occurrence analysis 
of keywords was used. The assessed articles produced a total of 459 keywords. VOSviewer groups the keywords 
into clusters using a set criterion for co-occurrences (Aghimien et al., 2021). The clusters identify common areas 
of research in past studies. The threshold of minimum number of occurrences of keywords in automatically set to 
five. According to Aghimien et at. (2021) there is no joint agreement regarding the ideal number of minimum co- 
occurrences to be applied in the body of knowledge. To ensure an optimal representation of keywords was 
identified in this study, the minimum number of occurrences was set to three. A total of 37 keywords met this 
threshold with a total link strength (TLS) of 460. General keywords such as “construction”, “construction industry’, 
“construction project”, etc, were removed. Additionally, keywords with the same meaning, such as block-chain 
and blockchain were blended. Finally, 32 keywords were generated as illustrated in Fig. 1. 
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Fig. 1: Keyword co-occurrence network 


The size of the nodes shows the frequency of occurrence and the lines between the nodes represent their co- 
occurrence in the same publication (Van Eck and Waltman, 2018). The closer the two nodes are, the greater the 
number of co-occurrences of the two keywords. For example, architectural design is found with close relationship 
with blockchain, information management and building information modelling design. It has been identified that, 
e-procurement, supply chains and project management were the most frequently entered keywords in the context 
of e- procurement in construction. It is unsurprising that e-procurement is at the centre of this network given it was 
the main search keyword to which other keywords are linked. Table 3 shows the occurrence and TLS of each 
keyword. Furthermore, the analysis categorised keywords that appeared multiple times into six clusters. 
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Table 3: List of clusters and co-occurring keywords 


Cluster I (Red) Occ. TLS Cluster 3 (Blue) Occ. TLS 
Architectural design 5 16 Construction management 3 11 
Blockchain 8 14 E-tendering 3 4 
Building Information Modelling 3 11 Innovation 4 14 
Commerce 6 18 Procurement 9 19 
Costs 3 5 Public procurement 3 8 
Life cycle 3 7 Cluster 4 (Yellow) 
Project management 9 34 Adoption 3 8 
Cluster 2 (Green) Developing countries 3 10 
Australia 4 yi E-procurement 27 58 
Contractors 7 15 Institutionalisation 3 9 
Electronic commerce 4 13 Vietnam 3 8 
Management 3 9 Cluster 5 (Purple) 
Social value 3 8 Barriers 7 10 
Supply chains 15 47 Construction procurement 7 24 
Sustainability 3 5 Electronic data interchange 4 15 
Purchasing 6 21 
Occ. = Occurrence Cluster 6 (Teal) 
TLS = Total Link Strength E-procurement systems 4 16 


4.4 Pattern of keywords 


In addition to the network map, an overlay visualisation map is produced in VOSviewer. This map shows the 
keywords based on their year of publication during the period of 2016 to 2021. A coloured bar, identifying the 
years with a correlating colour is displayed in the bottom right- hand corner of the map (Van Eck and Waltman, 
2018). For example, keywords coloured blue were published between 2016-2017 and focused on e-procurement 
areas relating to developing countries, adoption, institutionalisation, e-commerce, and information management. 
Publications between 2017—2019 seemed to shift focus slightly to areas such as supply chains, costs, barriers, 
purchasing, e-tendering, and e-procurement systems. These keywords are displayed in dark green/blue on the 
visualisation map. The latest years on the map which include 2020 — 2021 see a wide range of topics be introduced, 
including, electronic data exchange, social value, blockchain, building information modelling (BIM), innovation, 
sustainability, construction management and life cycle. These keywords are represented in bright green/yellow. 
Fig. 2 illustrates the overlay visualisation map. Examining the overlay visualisation map in conjunction with the 
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TLS results in Table 3 (cluster table) suggests that future research of e-procurement in construction could explore 
areas relating to e- tendering, costs, life cycle, sustainability, adoption, and social value. These areas have been 
identified to have low TLS results from past research studies. Despite their importance, these research areas have 
received little attention. 
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Fig. 2: Overlay visualization map. 


5. DISCUSSION 


From the foregoing review and bibliometric analysis, the following four key areas have been identified as the main 
categories of this research. 


5.1 Adoption rates in other countries 


Whilst several global studies into the barriers of e-procurement have been carried out (Yevu et al., 2022), analysis 
into the actual adoption levels of e-procurement practices globally is not heavily featured in the literature reviewed. 
Instead, analysis on adoption rates is often country specific. Several in-depth studies into a broad mix of developed 
and developing countries has been identified, such as Austria (Zunk et al., 2014), South Africa (Ibem and Laryea, 
2015), Nigeria (Afolabi et al., 2019) and Vietnam (Tran et al., 2021). Whilst there has been some investigation into 
e-procurement practices in the Australian construction industry (Lin et al., 2022; Loosemore and Reid, 2019), there 
is very little contemporary literature into the adoption rates in the Australian context and how this compares 
globally. This Study will seek to partially address this gap in existing literature. 


5.2 Benefits and enablers of e-procurement 


There is a significant volume of research and analysis into the benefits and enablers of e- procurement practices. 
The drivers and barriers of e-procurement in the construction sector was comprehensively examined in Eadie et 
al. (2010), with the key drivers identified including (1) Process, transaction and administration cost savings; (2) 
Convenience of archiving completed work; (3) Increased quality, efficiency and accuracy; and (4) Shortened 
internal and external communication cycle times. The literature into drivers and benefits has continually 
strengthened over the past decade, with key contributions from Yevu et al., (2021), Khahro et al., (2021), 
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Pattanayak and Punyatoya (2021) and Wimalasena and Gunatilake (2018), amongst others. 


Despite the significant volume of literature, there are few examples of ‘firm-level’ studies which validate the 
perceived benefits of e-procurement from a construction organisation perspective. The need for more firm-level 
studies to measure the link between productivity and digitization in the context of the Australian construction 
industry was identified by Leviakangas et al., (2017). There is also a clear lack of Australian-focused literature 
which examines the potential benefits of e-procurement based on the local industry environment. 


5.3 Barriers to e-procurement adoption and challenges upon implementation 


As observed with benefits and enablers, there is an extensive volume of existing literature and research into the 
barriers to e-procurement adoption in the construction industry. Eadie et al. (2010, 2012) was one of the first to 
examine these barriers in significant detail, though extensive primary research has been carried out across multiple 
countries since that time, with prominent examples being Yevu et al., (2021), Yevu and Yu (2019), Nawi et al., 
(2017) and Afolabi et al., (2017). Yevu et. al (2021) categorised some 21 individual barriers to e-procurement 
adoption into six barrier groups based on an extensive review of existing literature and primary research. These 
barrier groups include: 


i. Technological Usability and Evolution-Related Barriers 
ii. Security and Unsupportive Environment-Related Barriers 
iii. Culture-Related Barriers 

iv. Infrastructure-Related Barriers 

v. Unethical Practices—Related Barriers. 

vi. Financial and Skill-Related Barriers 


There is more limited research into the challenges of e-procurement usage in the construction industry upon 
implementation. Primary research through direct interviews with construction industry professionals in Ibem et al. 
(2021), Isikdag (2019) Nawi et al. (2017) and Brandon- Jones (2017) provide useful insights into the first-hand 
challenges of industry participants upon implementation of e-procurement systems. Whilst there is an excellent 
base of research from which the Study can leverage, it is evident from the literature review that there is a lack of 
Australian-focused studies which have identified (if any) Australian-specific barriers and challenges of e- 
procurement practices in the construction sector. The Study will seek to examine this in detail and further build 
upon the strong evidence base of research into the barriers of e-procurement adoption and the challenges identified 
by industry participants upon implementation of e-procurement systems. 


5.4 Conclusions 


Existing literature has carried out extensive engagement with industry stakeholders to identify the barriers to 
adopting e-procurement practices and the challenges upon their implementation as observed in Ibem et al., (2021), 
Isikdag (2019) Nawi et al. (2017) and Brandon-Jones (2017). It is noted that none of these studies have focused 
specifically on the Australian construction industry. Whilst there is an excellent base of research from which this 
study can leverage, it is evident from the literature review that there is a lack of Australian-focused studies on 
specific barriers and challenges of e-procurement practices in the construction sector. The scope of this review 
focusses on the implementation of e-procurement in facilitating towards digital transformation in the construction 
sector. The study examines this in detail in the next phase of study and further build upon the strong evidence base 
of research into the barriers of e-procurement adoption and the challenges identified by industry participants upon 
implementation of e-procurement systems. Study findings may be used to guide construction companies' 
investment choices in digitally modernising the procurement function. Through their procurement processes, 
organisations may boost their digital transformation objectives. 
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ABSTRACT: Knowledge management (KM) is used by construction firms to establish organizational memory 
(OM) and consequently improve their performance by learning from past mistakes and best practices. Knowledge 
is the input to innovation; thus, the industry must adopt better ways of managing knowledge for advancement in 
the construction processes. Knowledge management consists of locating, modifying, and sharing knowledge to 
meet the needs of the current fast-paced sector. Various tools have been developed to support KM and OM in 
construction companies, however, it is very important to adequately address the needs of the sector for successful 
implementations. The aim of this study is to analyze the existing KM tools and evaluate their compatibility with 
the necessities of today’s sector. First, knowledge types in the construction industry were outlined, and existing 
KM tools were evaluated. Then, expert interviews were performed with two representatives from a prominent 
construction and a prominent consulting firm to delineate the contemporaneous KM practices as well as the KM 
needs in the construction industry. Finally, current practice in KM in the construction sector is evaluated, and a 
vision is developed for a more effective KM approach that could support OM in construction firms. 


KEYWORDS: Knowledge Management, Organizational Memory, Construction Firms, Software Tools 


1. INTRODUCTION 


Intellectual assets are the primary capital in knowledge-intensive industries. For companies looking to thrive in 
knowledge-based work environments, in-house procedures should be created to accumulate, explore, and exploit 
corporate and individual knowledge. The term "knowledge management (KM)" was used in Western countries for 
the first time during the late 20th century in books, academic research, consulting, and organizational adaptation 
processes. Nevertheless, the management of organizational knowledge did not truly start until the mid-1990s. 
Implementations of KM mechanisms were somehow present in companies long before these concepts emerged, 
such as knowledge-sharing activities. In the early 1990s, sector leaders such as BP, Shell, and Chevron used KM 
initiatives before any academic publication (Quintas, 2005). This shows that KM techniques were naturally used 
as a part of cooperative incentives to gain a competitive advantage and improve business performance. Today, KM 
is at the center of any modern business as rapid developments in information and communication technologies 
influence a profound shift from tangible assets to intangible assets focusing on people and knowledge. Even though 
many sectors with such needs have adapted ways of managing knowledge today, it is only in the last ten years that 
companies in the construction sector have started familiarizing themselves with the KM concept (Rezgui & Miles, 
2011). 


In order to benefit from KM as a pursuit of enriching organizational memory (OM) and achieving productivity, 
the types of knowledge encountered must be analyzed. There is a popular attempt to categorize types of knowledge 
in order to ease comprehending, storing, and sharing it. Each categorization considers different aspects of 
knowledge that guide the analysis in the specific field. The most applicable and renowned categorization for the 
knowledge management field belongs to Polanyi (Polanyi, 1966), which divides knowledge into tacit and explicit 
knowledge. Explicit knowledge is the recordable or documented information that can be written, coded, or saved 
with the intention to transmit knowledge. Tacit knowledge is the knowledge an individual owns, gathered from 
their insights, personal experience, and observations. 


The construction sector is a knowledge-based industry comprising different project participants such as designers, 
engineers, contractors, and clients that generate overwhelming amounts of data in each project. Effective KM is 
crucial to organize and benefit from this data overflow. In the past decade, much research has focused on KM for 
various other sectors, such as consulting, finance, or computer science. Despite the evident need, the construction 
sector is still lacking effective KM tools to foster organizational memory in a project-based environment. The aim 
of this study is to explore the KM tools used for gathering, sharing, and storing tacit and explicit knowledge in 
construction companies. Since KM is effectively used in only a few initiator companies, the authors investigated 
the steps taken by these pioneers to delineate KM best practices for the construction sector. 


This paper reports the results of the two pilot interviews done with the senior managers of two multinational firms. 


Referee List (DOI: 10.36253/fup_referee_list) 
FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup_best_practice) 


Bartu Kologlu, Deniz Artan, A Preliminary Investigation of Knowledge Management Tools for the Construction Sector, pp. 499-507, © 2023 Author(s), 
CC BY NC 4.0, DOI 10.36253/979-12-215-0289-3.48 


The first firm is a consulting firm operating in 50 countries with more than 55 years of experience in the consulting 
sector. The interview was performed with a senior project director who has 10 years of experience in the sector. 
The second firm was chosen among the few contractors that use KM effectively, which operates in 50 countries 
with 107 years of experience in the construction sector. They are among the sector leaders with more than 80,000 
employees. The interview was performed with a country manager who has 20 years of experience in the sector. 
Both experts were asked the same set of 12 questions under three main categories, which were: Data Collection, 
Data Accessibility and Usage, and Knowledge Management (See Appendix for the Questionnaire Form). The 
interviews with the company representatives took approximately one and a half hours each, during which the 
experts were guided by the questions and expressed their views on the topics driven from, but not limited to, the 
questionnaire. 


2. EXPLICIT & TACIT KNOWLEDGE 


Resources and competencies are critical factors for companies to survive in an evolving and fiercely competitive 
atmosphere in the knowledge-based economy (Subramaniam & Youndt, 2005). Thus, one of the biggest challenges 
is to be able to distinguish characteristics of knowledge from information. Knowledge can be either explicit or 
tacit. It is also important to distinguish tacit and explicit knowledge in order to comprehend the notion of 
organizational knowledge 


Explicit knowledge is founded on widely recognized and objective standards. It is archived in the form of written 
procedures or documents. Therefore, it can be codified and communicated with relative ease. It encompasses the 
majority of knowledge exchange inside companies. Since explicit knowledge can be easily documented, 
formalized, and expressed, processes of sharing knowledge tend to be more widely used in the workplace. Several 
management tools are used to increase the willingness of employees to share their explicit knowledge, such as 
handbooks and information technology systems (Coakes, 2006). Since this knowledge can be codified, it may be 
reused repeatedly and is, hence, simpler to convey. Design codes of practice, performance requirements, paper- 
based or electronic drawings, and building methods are a few examples of explicit knowledge in the construction 
industry (Charles & Robinson, 2011). Other instances of explicit information include design sketches and 
photographs, 3-D models, and textbooks. Explicit knowledge is the data that can be interpreted by others once it 
has been codified. People with supplementary knowledge who are able to understand the "codes" and derive 
meaning from them may be able to understand the presented knowledge. Even this process of comprehending or 
deriving meaning from knowledge requires the application of implicit interpretational, evaluative, and generalizing 
skills. 


The foundation of tacit knowledge sharing is human experience (Nonaka & Takeuchi, 1995). Polanyi (1966) 
described tacit knowledge as instinctive knowledge that cannot be expressed coherently by means of words; it is 
acquired via collective involvement and can be challenging to describe, systematize, and transmit. The informal 
adoption of taught behavior and methods obtains the uncodified and disembodied know-how. Furthermore, tacit 
knowledge cannot be directly conveyed to someone since knowledge and task performance have distinctive 
personal qualities, requiring the acquirer to adjust their mindset. Hence, the degree to which it is conveyed varies. 
Tacit knowledge can be maintained by individuals or as a team in the form of collective experiences and 
assessments of events. Employee objectives, capabilities, routines, and intangible information are sources of 
individual tacit knowledge. On the other hand, collective tacit knowledge may arise from various notions, 
including top management strategies, organizational agreement on previous shared experiences, company 
procedures, company culture, and professional customs (Lyles & Schwenk, 1992). Tacit knowledge can also be 
described as knowledge that has been converted into a habit and possesses a personal quality, as well as being very 
context-specific. The reality of tacit knowledge is that the less clear and codified the tacit know-how, the more 
difficult it is for individuals and businesses to internalize. Academics and managers have overlooked the notion of 
tacit knowledge until recently, although it now plays a major role in corporate growth and economic 
competitiveness (Howells, 1996). Transmitting tacit knowledge is frequently done primarily through direct 
conversation. Some tacit knowledge transfers are official as a result of training programs or seminars, while others 
are more informal as a consequence of interdepartmental work teams, unofficial social networks, and personnel 
interactions. The desire and ability of individuals to share their knowledge and apply it in practice are crucial to 
the formal and informal transmission of tacit knowledge (Holste & Fields, 2010). On the other hand, explicit 
knowledge is able to be formalized and communicated through structured and methodical means, such as in the 
form of rules and procedures (Nonaka & Takeuchi, 1995). 


The difference between individual and collective explicit knowledge is that individual explicit knowledge consists 
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of expertise and abilities that are easily teachable or writable, whereas collective explicit knowledge lies in standard 
operating procedures, record keeping, IT systems, and policies (Brown & Duguid, 1991). Regarding innovation 
speed and financial success, tacit knowledge sharing is more influential than explicit knowledge sharing, whereas 
innovation quality and operational efficiency are influenced more by explicit knowledge sharing (Wang & Wang, 
2012). Thus, companies must learn to share and store both knowledge practices to unlock their potential benefits 
fully. The following statement explains the criticality of harmonizing explicit knowledge with tacit knowledge; “If 
Nasa wanted to go to the moon again, it would have to start from scratch, having lost not the data, but the human 
expertise that took it there last time” (Brown & Duguid, 2000,p.122). 


3. KNOWLEDGE MANAGEMENT 


In a knowledge economy, conducting business has opportunities as well as drawbacks. The opportunities include 
the potential for expanding market share, enhancing productivity, and increasing profitability through innovation 
and efficient knowledge asset management. The key difficulties are dealing with rising global competitiveness, 
shifting levels and patterns of client, customer, and societal demands, and the speed as well as effects of change in 
information and communication technologies (ICT) (Charles & Robinson, 2011). In order to gain a competitive 
edge, one must be able to use knowledge efficiently. A common question is whether organizations store knowledge 
in memory similarly to how people do. The answer is that there is a rising notion that organizations do have 
frameworks, practices, structures, and other tangible artifacts that demonstrate the existence of knowledge encoded 
in the organizational culture. 


The formation of an organizational memory (OM) within an organization is a critical knowledge management 
activity that promotes the organizational learning (OL) processes (Ozorhon, Dikmen & Birgonul, 2005). OM can 
be defined as the means by which knowledge from the past is brought to bear on present activities; thus, it helps 
to learn from previous experiences (Stein & Zwass, 1995). OM becomes a corporate asset by sharing, organizing, 
storing, and reusing the knowledge created previously. The knowledge management activities within organizations 
should aim to enhance the OM. 


OM requires continuous improvement and growth of organizational knowledge, which means that both the 
organizations and the individuals within them must be constant learners. One important aspect of KM is its need 
to reinvent your organization through learning constantly. Experience-based knowledge is incorporated into 
procedures and is embedded in technologies and systems. Organizational routines and a culture that encourages 
the creation, assimilation, and abandonment of outdated information and practices must be developed in order to 
promote continuous change. Organizations must accomplish two goals that may be in conflict with one another: 
first, they must build their knowledge bases over time and draw lessons from their past experiences; second, they 
must make sure that they are learning outside of their core competencies and develop the capacity to assimilate 
new knowledge in order to be able to respond to change (Quintas, 2005). The generation of knowledge is frequently 
seen as somehow more significant than knowledge reuse, more challenging to manage, and less 
dependent on information technology support. However, perhaps a more common organizational concern—and 
one that is unmistakably tied to organizational effectiveness—is the efficient reuse of knowledge (Markus, 2001). 
The reuse of knowledge in various decision-making mechanisms and circumstances is expected to result in the 
generation of new remarks that automatically update the organizational memory when stored back into the system. 
A cycle should be made where organizational memory is referred to on knowledge transactions, and outcomes are 
reflected back to enhance the organizational memory. 


Construction companies must implement knowledge management mechanisms in their daily routines to improve 
effectiveness and thrive in an overly competitive sector. In order to meet this objective, first, the sources of 
knowledge generation need to be analyzed, as well as the type of knowledge they generate. As explained in the 
previous section, the possible tools for regulating and sharing tacit and explicit knowledge differ due to the nature 
of the knowledge. 


4. KNOWLEDGE MANAGEMENT TOOLS 


There are a variety of knowledge management tools available to choose from thus, it is vital to select the 
appropriate tool that addresses the goals of the organization adequately. The KM tools can be categorized as IT 
and non-IT-based tools that are used to support the essential aspects of KM such as sharing, reusing, and locating 
knowledge. In order to distinguish between the two categories, experts suggest naming IT-based tools as KM 
technologies and non-IT-based as KM techniques (Al-Ghassani, Anumba, Carrillo, & Robinson, 2005). 
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KM techniques do not require IT tools to execute the sub-processes of KM, such as knowledge sharing. It is clear 
that the scope and nature of human knowledge are much broader than what can be encoded by IT tools. Some of 
these tools are; seminars, post-project reviews, communes of practice, project feedback mechanisms, mentor 
programs, and training programs. Knowledge has a social aspect thus, seminars, communes of practice (where 
different professions meet to interact), and training programs are great opportunities for employees from different 
backgrounds to meet and share knowledge. Whereas post-project reviews, project feedback mechanisms, and 
mentor programs promise a similar scenario to a master-apprentice model where junior individuals get to be 
criticized and influenced directly by a senior colleague, which is an extremely effective knowledge-sharing 
mechanism. These tools may seem simpler to implement when compared to IT-based KM tools, however, they 
hold a much greater value for the initialization of tacit knowledge when compared to IT-based tools. Often the 
highly skilled members of the working environment are unaware of their tacit knowledge, such as their problem- 
solving skills or the resources they use. For this reason, knowledge sharing becomes highly dependent on 
communication within the working environment. Tacit knowledge is personal, linked to experience and learning, 
and cannot be coded. This results in tacit knowledge being shared within groups with common learning experiences 
and understandings rooted in common practice via non-codified pathways (Brown & Duguid, 1998). 


The IT-based KM tools mainly focus on capturing codifiable knowledge. These tools act as a great OM archive 
that eases how organizations create an organizational learning and knowledge management culture. The data stored 
in software and hardware systems can be referred to and reused whenever necessary, making monitoring the data 
much simpler. Today, there is a variety of software-based programs on the market that offer diversified approaches 
to KM. 


Using Artificial Intelligence (AI) and Machine Learning (ML) based software for KM is one of the leading trends 
in the knowledge industry. The classification, labeling, and retrieval of data are only a few examples of knowledge 
management tasks that can automated with the help of AI. Large volumes of unstructured data may be analyzed 
by these technologies, making it simpler to find insightful patterns and trends in a company's knowledge base. 
According to Forrester Consulting’s principal analyst, Gualtieri (2016), between 60% to 73% of all the collected 
data within an enterprise goes unused for analytics. With the help of AI-driven KM tools, advanced data structuring 
could be done for an insight-driven data presentation for the knowledge seeker. The outcome is similar to a personal 
intelligent assistant that can revolutionize how knowledge workers consume meaningful information and increase 
their cognitive capacity by providing them with more efficient tools for processing, filtering, sorting, and 
navigating information sources (Jarrahi, Askay, Eshraghi & Smith, 2022). Thus, organizations can improve their 
search capabilities, use time more efficiently for knowledge management operations, and provide employees with 
more individualized content suggestions by implementing AI-powered algorithms. 


Another current KM-IT tool is the ontology-based KM system and its application. A knowledge management 
system based on an ontology is more capable of encouraging the integration of linked resources, 
identifying precise knowledge rapidly, and steering away a significant amount of unnecessary knowledge. The 
procedure transforms disorganized knowledge data into structured knowledge by transferring all the necessary 
information. Storage of knowledge is the process by which the metadata is extracted from the knowledge sources 
acquired, and knowledge objects are marked in the implication of ontology and metadata standards, with the aim 
of transforming semi-structured and unstructured knowledge into structured knowledge and storing it in the 
knowledge base (Zhang, Zhao, Wie & Chen, 2015). 


Whether it is a more futuristic approach to KM, such as Al-based tools or more simple cloud-based archive 
programs, these IT tools share distinct key functions to cover the majority of the needs of KM. Firstly, a “Document 
Management” function must be present to act as an archive with a correct taxonomy for material and track 
document changes when they occur. The second one is a “Knowledge Archive”. Knowledge bases store structured 
and unstructured information in the system. These could be not only documents but also tutorials, videos, etc. The 
third key functionality is the “Security System”. This feature limits accessibility for predetermined employees that 
determines which data is available to obtain. The fourth feature is a strong “Search Function”. This function aims 
to save time when searching for past documents. The final feature should be “Communication Tools”. 
Communication channels can make the systems much more efficient, especially when one has further questions 
on the uploaded material and can directly reach the author. 


5. CONSTRUCTION SECTOR & KM 


The activities of today's construction industry demand an increased level of knowledge, skills, and learning, as the 
sector is a multilayered knowledge-based environment that has knowledge input from different project parties 
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(Ferrada, Núñez, Neyem, Serpell, & Sepulveda, 2016). Explicit and tacit knowledge come together to form 
organizational knowledge. In every individual's thought lies an accumulation of tacit knowledge. It is a collection 
of experiences, observations, and intuition that can be either cognitive or technical. Examples of tacit knowledge 
in the context of construction may include estimating and tendering prices that have been prepared over time 
through practical experience in preparing bids, encountering the construction processes, interaction with 
clients/customers and project team members in the construction supply chain, as well as an understanding of 
markets. Experience-based, judgmental, and context-specific knowledge makes it challenging to codify and share 
this type of knowledge. 


Explicit knowledge in construction is generally the data obtained from site activities. This could be man-hours, 
machine hours, periodical reports, unit prices, and anything generated from real-life implementations. As a result, 
better ways of knowledge management should be the primary target to comprehend the overflow of data in the 
construction sector. However, this might be unfavorable at first for some managers due to the general nature of the 
lack of human resources or timely pressure on on-site activities. Thus, the general outcome and long-term benefits 
of adopting such an ideology must be made clear to decision-makers for the right resource allocation. Every 
employee in a construction organization must embrace a culture that values knowledge capture and sharing of 
knowledge. 


However, there are a set of socio-technical barriers defined by Rezgui & Miles (2011) that limit the progress of 
KM in the construction industry. Firstly, employees do not perceive any immediate benefits from sharing 
knowledge and experiences. In fact, this is seen as a possible threat to their status as "experts" since there is usually 
no encouragement for a supportive knowledge-sharing culture focused on all employees. (e.g., by implementing 
creative ways for rewards and recognition). Next, shelf solutions do not work, and there is a weak culture of 
software adoption. In order to perform their duties and access software, employees are frequently limited to a 
specific place, which is usually their office. However, access to information from construction sites is frequently 
constrained by network availability. Another obstacle is that the industry is divided and organized into numerous 
disciplines, each of which has its own rules and specialized terminology. There is not a particular language that 
captures a shared comprehension of construction principles utilized across disciplines. All these aforementioned 
challenges limit effective communication and the sharing of experiences. 


By actively participating in projects over an extended length of time, one can gain valuable construction knowledge. 
However, this is usually not the case, as employee turnover is radically high. The specific needs of the 
employees who will use the project data may not always be understood by those in charge of gathering and 
archiving it. Furthermore, data is gathered and archived at the end of the construction phase rather than being 
handled while it is being created. By now, it is likely that those who were aware of the project have moved on to 
other projects. Again, due to high turnover, many businesses keep archives projects however, it is challenging to 
get in touch with the original report authors. These projects should be available to be used with little (or no) 
consultation, this past data should have a rich representation of the data context. Lastly, decision-making objectives 
are frequently not noted or documented. The millions of spontaneous messages, phone calls, emails, and 
discussions that comprise much project-related information require complex methods to track and document. 


In order to understand the reflections of these limitations on current construction sector knowledge management 
practices, two interviews were conducted. The first interview is held with the country manager of a 100-year-old 
multi-national construction firm, which is among the top 20 highest-grossing international contractors according 
to ENR magazine 2023. The second interview is made with a senior consultant in one of the top global strategy 
consultancy firms by revenue, who has one of her expertise in KM. The reason for the second interview being 
made with a representative from the consultancy sector is due to the factor that they were one of the earliest 
adaptors of KM tools in their organizations. From the early stages of the introduction of the KM concept, the major 
consulting firms took advantage of the immense potential of information technology as the driving force in the 
business world. The ideology has always been similar to the one today, which combines well-known IT tools like 
databases to make it easier to gather, share, store, retrieve, and use knowledge (Easterby-Smith & Lyles, 2015). 
For this reason, comparing the approaches to KM of a successful construction firm to one of the best knowledge- 
managing sectors in business would outline the necessities to be implemented by the construction industry, which 
is the aim of this study. 


6. FINDINGS OF THE EXPERT INTERVIEWS 


The interview questions were categorized under three headlines: Data Collection, Data Accessibility/ Usage and 
Updates, and Knowledge Management. The results are explained with the participants being referred to as 
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“Consultancy (Firm) Representative” and “Construction (Firm) Representative”. On the first question of the Data 
Collection section, participants were asked if there is a department in their organization dedicated to collecting and 
storing data from past projects. It was revealed by the construction representative that there was no such dedicated 
department, but all the departments in the headquarters (such as Procurement, Legal Matters, HR, etc.) would 
collect their own data from the sites. The consultancy representative revealed that they did have a special team 
dedicated to knowledge management for every specific field of activity. Next, participants were asked if their 
organizations had a digital database and a predetermined taxonomy for storing this data and what type of data was 
chosen to be stored in this system. The construction representative explained that they do have a digital database 
to store this knowledge, but only the specific departments have a taxonomy to obtain worldwide uniformity, such 
as the finance department, cost control department, and HSEQ department. It was added that they try and store 
most of the explicit data generated from the site, such as the man-hours, machine hours, accident rates, etc. 
Similarly, the consultancy representative stated that they do have a company-wide digital database to store project 
data. It was revealed that the uploaded material usually is in project analysis reports that have some identifiers that 
make it easy to find it in the future, such as keywords, date, location of the project, project team, a summary page, 
ete. 


The next section was about Data Accessibility/Usage and Updates, where the first question inquired if the stored 
data from the previous projects could be accessible anytime when needed and who could access this data. The 
construction representative stated that this data is only accessible to the related departments at the headquarters, 
and site employees could only access it via headquarters. On the other hand, the consultant representative made 
clear that this data could be accessible to anyone at any time. The second question on this topic examined if the 
ongoing projects referred to the stored data, how often they referred to it, and what type of data was most frequently 
requested. The answers to this question were quite different, as the construction representative stated that even 
though there are times when the site refers to the stored data, it is not too regularly requested. He added that the 
most requested data from the headquarters is the sub-contractor-related data (such as their prices, if they have 
worked with them before, and their references). On the contrary, the consultant representative stated that it is 
referred to at the beginning of each project. The last question was asked to determine how frequently the system 
was updated with the knowledge currently produced. The construction representative stated that it was every month 
unless there was a special reason to make it more frequent, and the consultancy representative stated that the 
sanitized version (a version that prevents the disclosure of the client) was uploaded at the end of each project. 


The final section consisted of questions regarding the KM policies. The first question of this section was to 
determine the explicit knowledge-sharing methods used in their company. The construction representative stated 
that they have an e-learning platform for employees to work on themselves and that some of the end project reports 
and analyses are available for every employee to view. The major explicit knowledge sharing on the consulting 
firm is stated to be their online tool, where the majority of the project data is imported into. The last question was 
about the knowledge-sharing mechanisms of tacit knowledge. The construction representative stated that there are 
regular voluntary webinars and seminars made, but most employees are expected to obtain this knowledge from 
working with one another, similar to the master-apprentice model. The consultant representative stated that they 
too, have seminars on general business matters and seminars on expertise fields. However, it was explained that 
although these seminars are voluntary, they have a reward system, such as being awarded a certificate or conference 
invitation. Additionally, it was explained that they, too aim to convey the tacit knowledge by the mater-apprentice 
model but in a more structured way. Firstly, a pairing system is used within project groups where seniors and 
juniors are matched. Next, throughout a project, the juniors get to work with their seniors to understand the business 
approaches they take in action. Then, after the end of the project, there is a feedback mechanism that monitors the 
desired performance of individuals, which is a great way to restructure the knowledge learned from the project. 


7. DISCUSSION AND CONCLUSION 


The review performed on the requirements of the construction sector revealed that knowledge management tools 
need to (1) collect the contextual details in a structured, continuous and real-time manner, (2) overcome difficulties 
in the extraction of knowledge from text-based data, (3) encourage a knowledge sharing culture, (4) combat the 
limits of the fragmented industry in effective communication and the sharing of experiences, (5) meet the specific 
needs of the employees who will use the project data, not those in charge of gathering and archiving it, and (6) 
facilitate efficient reuse of knowledge and learning outside of core competencies. 


The findings from the expert interview show similarities as well as major differences. For example, even though 
each department is set to collect data from the site in the construction firm, it is ambiguous how much the 
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employees are involved in the real site activity when compared to knowledge management teams in consulting 
firms who act as part of the project for a period of time. Another major difference is seen with the KM software 
these companies use. The accessibility and usability of the KM tool of the construction firm seem considerably 
constrained in terms of creating a knowledge-sharing culture compared to the KM tool used in the consulting firm. 
Even though the diversity of data is richer in construction, due to a lack of accessibility from key players and site 
personnel, there are significant limits to embracing the concept as a company culture. Finally, the biggest gap 
between their KM ideologies is related to their approach to sharing tacit knowledge. We can see a clear company 
culture inside the consultancy firm that motivates and provokes both the knowledge owner and knowledge seeker 
to interact in a knowledge-sharing activity. This is either done by rewarding systems for participation in seminars 
or compulsory feedback mechanisms post-project. On the other hand, an environment is set for this knowledge- 
sharing interaction in the construction firm, however, it is left to the employee will and enthusiasm to engage in it. 


The interview results reveal that most of the determined KM requirements of the construction industry have not 
been incorporated into the current tools and practices. Whether it is due to the natural barriers of the construction 
firms, which is explained in the previous section, or being the latecomer to the KM concept, one thing is for sure; 
which is that are many areas of improvement for enhancing the engagement in KM within the construction firms. 
The sector has a huge advantage in generating enormous amounts of knowledge, which, if and when interpreted 
correctly, could result in a much more efficient, resilient, and technologically advanced sector. 


These challenges faced by construction companies can only be overcome by establishing and maintaining a 
knowledge culture where knowledge is valued and generated, shared, and utilized as an instinctive aspect of 
corporate activities. Organizations and the individuals within them must be constant learners, and this demands a 
clear vision, strong leadership, and solid processes from the corporation. If the construction industry is to build 
and maintain the capability in a knowledge economy, it must shift its adversarial culture to a sharing culture. 
Furthermore, it has to learn from each project and then transfer knowledge from projects to organizational bases 
to improve OM. A cycle should be made where organizational memory is referred to on knowledge transactions, 
and outcomes are reflected back to enhance the organizational memory. For future studies, this research that acts 
as a pilot study will be broadened with more interviews from the construction sector to understand the in-depth 
usage of KM tools within organizations. 


ACKNOWLEDGMENTS 


This thesis is based on Bartu Kologlu’s MSc. Thesis. We appreciate the significant contributions of all experts to 
the completion of this paper. 


REFERENCES 


Al-Ghassani, A. M., Anumba, C. J., Carrillo, P. M., & Robinson, H. S. (2005). Tools and Techniques for 
Knowledge Management. In C. J. Anumba, C. Egbu, & P. Carrillo (Eds.), Knowledge Management in Construction 
(pp. 10-30). London, UK: Blackwell Publishing. 


Brown, J. S., & Duguid, P. (1991). Organizational learning and communities-of-practice: Toward a unified view 
of working, learning, and innovation. Organization Science, 2(1), 40-57. https://doi.org/10.1287/orsc.2.1.40 


Brown, J. S., & Duguid, P. (1998). Organizing Knowledge. California Management Review, 40(3), 90-111. 
https://doi.org/10.2307/41165945 


Brown, J. S., & Duguid, P. (2000). The Social Life of Information. Boston, MA: Harvard Business School Press. 


Coakes, E. (2006). Storing and sharing knowledge: Supporting the management of knowledge made explicit in 
transnational organisations. The Learning Organization, 13(6), 579-593. 


Easterby-Smith, M., & Lyles, A. M. (2012). The Evolving Field of Organisational Learning and Knowledge 
Management. In M. Easterby-Smith, & A. M. Lyles (Eds.), Handbook of Organisational Learning and Knowledge 
Management (pp. 1-20). New York, NY: J. Wiley & Sons. 


Egbu, O. C., & Robinson, S. H. (2005). Construction as a Knowledge-Based Industry. In C. J. Anumba, C. Egbu, 
& P. Carrillo (Eds.), Knowledge Management in Construction (pp. 31-49). London, UK: Blackwell Publishing. 


505 


Ferrada, X., Núñez, D., Neyem, A., Serpell, A., & Sepulveda, M. (2016). A lessons-learned system for construction 
project management: A preliminary application. Procedia - Social and Behavioral Sciences, 226, 302-309. 
https://doi.org/10.1016/j.sbspro.2016.06.192 


Gualtieri, M. (2016). Hadoop is data’s darling for a reason. Forrester. Available at 
https://go.forrester.com/blogs/hadoop-is-datas-darling-for-a-reason/ 


Holste, J.S., & Fields, D. (2010). Trust and tacit knowledge sharing and use. Journal of Knowledge Management, 
14(1), 128-140. doi: 10.1108/13673271011015615 


Howells, J. (1996). Tacit knowledge, innovation and technology transfer. Technology Analysis and Strategic 
Management, 8(2), 91-106 


Jarrahi, M.H., Askay, D., Eshraghi, A., & Smith, P. (2022). Artificial intelligence and knowledge management: A 
partnership between human and AI. Business Horizons 66(1), 87—99 


Lyles, M. A., & Schwenk, C. (1992). Top management, strategy and organizational knowledge structure, Journal 
of Management Studies, 29(2), 155-74. 


Markus, L. M. (2001). Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors 
in Reuse Success. Journal of | Management Information Systems, 18(1), 57-93. 
doi:10.1080/07421222.2001.11045671 


Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating company: How Japanese companies create the 
dynamics of innovation. Oxford University Press. 


Ozorhon, B., Dikmen, I., & Birgonul, M.T. (2005) . A case-based reasoning model as an organizational learning 
tool, Paper presented at CIB 2005 Helsinki Joint Symposium, Helsinki, Finland. Retrieved from 
https://www.irbnet.de/daten/iconda/CIB6822.pdf 


Polanyi, M. (1966). Human knowledge. Chicago, IL: The University of Chicago Press. 


Quintas, P. (2005). The Nature and Dimensions of Knowledge Management. In C. J. Anumba, C. Egbu, & P. 
Carrillo (Eds.), Knowledge Management in Construction (pp. 10-30). London, UK: Blackwell Publishing. 


Rezgui, Y., & Miles, J. (2011). Harvesting and Managing Knowledge in Construction. New York, NY: Spon Press. 


Stein, E. W., & Zwass, V. (1995). Actualizing Organizational Memory with Information Systems. Information 
Systems Research, 6(2), 85-117. Retrieved from http://www.jstor.org/stable/23011005 


Subramaniam, M., & Youndt, M. A. (2005). The Influence of Intellectual Capital on the Types of Innovative 
Capabilities. The Academy of Management Journal, 48(3), 450-463. 


Wang, Z., & Wang, N. (2012). Knowledge sharing, innovation and firm performance. Expert Systems with 
Applications, 39(10), 8899-8908. 


Zhang, J., Zhao, W., Xie, G., & Chen, H. (2011). Ontology-Based Knowledge Management System and 
Application, Procedia Engineering, 1021-1029. 


8. APPENDIX: QUESTIONNAIRE FORM 


This questionnaire form is part of the MSc thesis study of Bartu Kologlu at Istanbul Technical University 
Construction Management Program. The aim of the thesis is to understand the current knowledge management 
practices and the tools that construction companies use to maintain and enhance their organizational memory. The 
answers will only be used for academic purposes, and the answers will be evaluated anonymously without the 
identity of the participant/organization. 


1) Data Collection: One of the main capital of knowledge-intensive sectors such as construction/consulting is 
intellectual assets. Most of the processes are generated toward exploration, accumulation, and exploitation of 
individual and firm knowledge. Your company has been in the construction/consulting sector for many years 
and has completed many projects. 

1.1) Is there a department/process in your company that collects and stores the knowledge data acquired from 
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2) 


3) 


the projects? 

1.2) What kind of data is collected/stored from the previous projects (financials, man-hours, machine hours, 
financials, reports, etc.)? 

1.3) How often is this data collected? 

1.4) How is this data stored? Is there a digital database for this purpose? If yes, is there a predetermined 
taxonomy or a uniform filing system that is used to store the data? 


Data Accessibility and Usage 

2.1) Is the stored data from the previous projects accessible when needed? 

2.2) If yes, do the ongoing projects use this data? How often is the previous data used for ongoing projects? 

2.3) What type of data is used? Please list the specific information/data items used most frequently. 

2.4) Is there an IT program to access this data? If yes, what is the most critical aspect of this program to operate 
correctly? 


Knowledge Management: We can divide knowledge into two categories: Explicit and Tacit. Explicit 

Knowledge is the documented or recorded information that is written or saved. Tacit knowledge is the 

knowledge that an individual owns that is gathered from their personal experience, insights, and observations. 

3.1) What are the knowledge-sharing methods in your company for explicit knowledge (seminars, shared 
monthly reports, etc.)? 

3.2) Skilled members of a community of practitioners are often unaware of the tacit knowledge they possess, 
e.g., their problem recognition and problem-solving behavior, the rules that they follow, and the knowledge 
sources that they draw on. What are the methods in your company that convey tacit knowledge 
transactions? 

3.3) The sector operates in a project-based environment. How can you ensure that individual knowledge 
becomes a company asset and does not disappear when that person is no longer part of the company? 

3.4) Are there any other Organizational Learning practices your company performs? 
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ABSTRACT: This paper presents an enhanced BIM logger designed to capture both geometry and attribute 
changes of building element geometries, thereby offering a transparent source of representation of the BIM 
authoring process. The authors developed the logger and reproduction algorithm using the Revit C# API based on 
the analysis of information required to define building elements and associated attributes. The enhanced BIM log 
was evaluated through a case study of Villa Savoye designed by Le Corbusier. Despite negligible discrepancies, 
the results show that the enhanced BIM log can accurately represent the BIM authoring process capturing and 
reproducing 92.45% of the building elements from the original BIM model. Future research can focus on 
expanding the scope of logging and probing the potential of automating the BIM authoring process using these 
enhanced BIM logs. 


KEYWORDS: Building information modeling (BIM), BIM log mining, BIM authoring software, Custom BIM log, 
Authoring process reproducibility. 


1. INTRODUCTION 


The architecture, engineering, construction, and operation (AECO) industry has experienced a transformational 
shift with the widespread adoption of BIM technology. The shift has not only enhanced productivity but also 
improved informed decision-making within the sector (Sacks et al., 2018). Although BIM models—products of 
the BIM process—display the finalized decisions of a project, they encapsulate a wealth of insights due to the 
extensive decision-making endeavors underpinning their authoring. Consequently, the BIM authoring process can 
serve as a valuable knowledge repository for understanding the decision-making process. 


BIM log mining seeks to extract these insights by examining the BIM logs in detail. BIM logs serve as invaluable 
data reservoirs, capturing sequential events recorded during BIM software usage (Jang et al., 2023). Previous 
studies have explored various aspects of the BIM authoring process, from design authoring patterns 
(Yarmohammadi et al., 2017), productivity (Shin, 2023; Shin et al., 2022), and collaboration patterns (Zhang & 
Ashuri, 2018), to the specific roles of modelers (Forcael et al., 2020). 


Researchers have emphasized the significance of incorporating data attributes to elucidate the as-happened process 
within the log to attain reliable results from the analysis (Bose et al., 2013; Suriadi et al., 2017). Nonetheless, the 
BIM logs produced by prevailing BIM software overlook modifications in building elements undertaken during 
the BIM authoring phase because they were originally developed for maintenance operations (Autodesk, 2022). 
Consequently, such logs often miss the depth required to accurately capture the BIM authoring narrative (Jang et 
al., 2023; Yarmohammadi & Castro-Lacouture, 2018). Even though efforts have been made to enhance these logs 
through custom loggers (Gao et al., 2021; Jang et al., 2021; Kouhestani & Nik-Bakht, 2020; Pan & Zhang, 2021; 
Yarmohammadi & Castro-Lacouture, 2018), these adaptations still are not able to capture critical geometric 
nuances like shapes, scales, and locations—details pivotal to understanding the evolution and decision making 
involved in the BIM authorship. 


To address the issue, this paper introduces an improved BIM logger capable of capturing comprehensive details 
for the precise replication of the BIM authoring process. The methodology behind this advanced logger includes 
analyzing the essential data needed to describe building elements in Autodesk Revit, followed by the design of a 
custom BIM logger to record this crucial information. In addition, a reproduction algorithm was developed to 
evaluate the logger's accuracy in representing the BIM authoring process. The reproducibility of the logger was 
further validated through a case study. 


The paper is structured as follows. Section 2 reviews previous custom BIM loggers proposed in the literature, and 
Section 3 describes the research methodology employed in this study. Section 4 reviews the minimum information 
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requirements to define BIM elements in Revit, and Section 5 outlines the development of the enhanced BIM logger 
and reproduction algorithm. Section 6 evaluates the reproducibility of the enhanced BIM logger through a case 
study, and Section 7 concludes the paper. 


2. LITERATURE REVIEW 


BIM log mining is a data analysis approach that utilizes process mining techniques to explore BIM event logs 
collected during a BIM software operation. Process mining includes various techniques for automated process 
discovery, social network analysis, process optimization, case prediction, and history-based recommendations 
(Aalst et al., 2011). However, event log imperfections can lead to unreliable results, and researchers have 
introduced an incremental approach to evaluating event log fitness and a methodology to guide process mining 
execution (Bose et al., 2013; Suriadi et al., 2017). 


While improving event log quality has received significant attention, several studies have focused on enhancing 
the information included in the BIM logs. These include custom Revit logger which extracts element identifiers 
and bounding boxes (Yarmohammadi & Castro-Lacouture, 2018), IFC loggers which capture snapshots and 
identify the changes made between different versions of BIM models (Kouhestani & Nik-Bakht, 2020; Pan & 
Zhang, 2021), and command-object graphs to notate the geometric modeling sequence (Gao et al., 2021) to 
improve the understanding on modeling patterns. However, these custom logs still grapple with reproducing the 
BIM authoring process due to missing information to represent the geometric shape and attributes of the building 
elements. 


The BIM authoring process is often defined as a process in which 3D software is used to develop a BIM model 
based on criteria that are important to the translation of the building’s design (Messner et al., 2019). Accordingly, 
this process includes the addition, deletion, and modification of the geometry of building elements and their 
associated properties (Kouhestani & Nik-Bakht, 2020; Lin & Zhou, 2020). Through BIM authoring, building 
elements are refined to meet the level of development (LOD)—the information necessary to depict the building 
elements—within the BIM model. However, the BIM model only reflects the end result of the comprehensive BIM 
authoring process. Concurrently, the design decisions made during this process can be a rich and invaluable 
reservoir of information, encapsulating real-time design decision-making. Documenting these design decisions 
can also aid in extracting the design knowledge of architects, especially when paired with recently emerging data- 
based analysis techniques (Jang et al., 2023). In this context, this study aims to develop a custom logger designed 
to capture essential information, enabling the reproduction of the BIM authoring process. 


3. METHODOLOGY 


This study developed a three-step methodology to enhance the reproducibility of BIM logs as depicted in Fig. 1. 
The methodology involved developing a customized BIM authoring logger, implementing a BIM authoring 
process reproducer, and evaluating the reproducibility of the enhanced BIM log through a case study. 


Implementation: 
BIM authoring process 
reproducer 


Evaluation: 
Case study 


Implementation: 
BIM authoring logger 


FINISH 


Fig. 1: Research flowchart. 


To capture sufficient information in the BIM log to reproduce the BIM authoring process, the authors analyzed the 
minimum inputs required to represent building elements in Revit. As building elements are represented using 
classes in an object-oriented programming model, the information required for each class corresponding to a 
specific category of elements was analyzed. The log captures the geometric shape and attribute values to represent 
the building elements in a comma-separated value (CSV) format. The reproduction algorithm developed in this 
study iterates through the events recorded in the BIM log, identifies the command type (i.e., “ADDED,” 
“MODIFIED,” or “DELETED”), and executes them with reference to the “Comments” property (i.e., copied 
“ElementID”) of the building elements. 


The evaluation phase of this study involved modeling Villa Savoye designed by Le Corbusier for the case study, 
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during which the BIM authoring logger recorded the authoring process (Fig. 2). The events recorded in the 
enhanced BIM log were iterated using the BIM authoring process reproducer, and the reproducibility of the 
developed model was evaluated based on the volumetric center distances and volume differences between elements, 
as well as visual analysis of plans, elevations, sections, and 3D views. 


Log the authoring å > 
process of Villa ~~ + a 
N Savoye ez ie DA 
rA \ / \ 
/ \ Original model Evaluate ho FINISH | 
START reproducibility \ j 
\ f 
\ J > > < y 
S — Enhanced Reproduce the Revit g S — 
BIM log authoring process SS” 
Ra 
Reproduced model 


Fig. 2: Evaluation of BIM authoring logger and BIM authoring log reproducer. 


4. ENHANCED BIM LOGS 


This section provides a detailed overview of how building elements are defined in Revit and how the enhanced 
BIM log captures the necessary information to accurately represent the BIM authoring process. The study utilized 
Autodesk Revit 2023 and Revit C# API for the implementation of the enhanced BIM log. Building elements in 
Revit are represented using classes in an object-oriented programming model that has a hierarchical structure that 
reflects the physical structure of the building. Each class corresponds to a specific type of building element, such 
as walls, floors, roofs, doors, and windows. Fig. 3 illustrates the geometric shape and attribute values recorded for 
each building element category. 


Category Walls LoactionCurve Vertically 
i ea WallType / Height / Offset / Flip extruded wall 


Profile 
Normal extruded wall 
Floors \ Flat slab 
Profile 
SlopeArrow / Slope ——————— 


FloorType — Sloped slab 
Level 
Family HostElement / LocationPoint WindowType— Ñ 
in Windows 
Level 
DoorType ———— l Doors 
ColumnType LocationPoint ———_ —_—_——_. | Verical columns 


Others LocationCurve ————_——_______—_ / Slanted Columns 


Fig. 3: Geometric shape and attribute values recorded within the enhanced BIM log. 


Wall elements in Revit are classified into two categories—rectangular profile walls and nonrectangular profile 
walls—following the different creation methods for each type. Floor elements are classified into two types, flat 
floors, and sloped floors, and are created using different attributes depending on the type. Windows, doors, and 
columns are classified as “FamilyInstances,” and their representation is based on the placement of predefined 
instances on the selected base geometry. “FamilyInstances” have different categories based on the “FamilySymbol” 
used, such as “WindowType,” “DoorType,” and “ColumnType.” Windows and doors require a “LocationPoint” 
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and “HostElement,” while columns can be placed without a “HostElement,” depending on whether the 
representation of the slope is required. 


In addition to recording other required information items as string formats, the enhanced BIM log captures 
geometric bases, such as “LocationPoint,” “LocationCurve,” and “Profile,” and their respective subclasses, such 
as “Line,” “Arc,” “CylindricalHelix,” “Ellipse,” “HermiteSpline,” and “NurbsSpline.” The information required 
for each geometric base and its string format representation is presented in Table 1. The enhanced BIM logger 
records the geometric bases of the building elements represented during the BIM authoring process in the described 


format. 


Table 1: Definition of geometric classes. 


Classes Subclasses Input Requirements String Formats 
Location Point XYZ (X coordinate, Y coordinate, Z coordinate) (double, double, double) 
Line (endPoint1, endPoint2) Line, XYZ, XYZ] 
Arc (plane, radius, startAngle, endAngle) Arc, XYZ, double, double, double] 


CylindricalHelix 


(basePoint, radius, xVector, zVector, pitch, 
startAngle, endAngle) 


(center, xRadius, yRadius, xAxis, yAxis, 


CylindricalHelix, XYZ, double, XYZ, 


XYZ, double, double, double] 


Ellipse, XYZ, double, double, XYZ, XYZ, 


Location Curve Ellipse 
startParameter, endParameter) double, double] 
NurbsSpline, int, [List<double>, 
NurbsSpline (degree, knots, controlPoints, weights) 
Ilist<XYZ>, Ilist<double>] 
HermiteSpline, Ilist<X YZ>, bool, 
HermiteSpline (controlPoints,periodic,tangents) 
HermiteSplineTangents] 
{CurveLoop, LocationCurve), ..., 
CurveLoop CurveLoop (LocationCurve,, ..., LocationCurve,) 
LocationCurve,} 
Profile Profile (CurveLoop, ... ,CurveLoopn) Profile, CurveLoop), ... ,CurveLoopy 


Furthermore, multiple attribute values in Revit can be modified to better represent each building element. For 
instance, recording whether it was flipped was important in representing vertically extruded walls because the 
sequence of the wall layer can be positioned opposite. Meanwhile, a normal vector that the wall is facing is critical 
in the horizontally extruded walls. The initial values and modifications of the attributes are also recorded in the 
enhanced log, providing information on how the building elements were defined and developed during the BIM 
authoring process. 


5. REPRODUCTION ALGORITHM 


The authors implemented a reproduction algorithm to iterate through the events in the enhanced BIM log and 
repeat them, as illustrated in Fig. 4. The algorithm begins by identifying the command type of events. If the 
command type is “ADDED,” the algorithm adds an element of the recorded category and applies the recorded 
attribute values, while adding the “ElementID” of the event to the “Comments” attribute of the newly created BIM 
element. If the command type is “MODIFIED,” the algorithm queries the element with the “ElementID” recorded 
in the Comments value and applies the corresponding modification. If the command type is “DELETED,” the 
algorithm queries the elements with the “ElementID” recorded in the “Comments” value and deletes the element. 


511 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


Read BIM log with n events 


Add an element of E,,,.Category and 
add ElementiD on "Comments" 


Modify the element with E,,.ElementiD 
on "Comments" 


Delete the element with E,,.ElementID 
on "Comments" 


Fig. 4: Reproduction algorithm. 


6. EVALUATION OF REPRODUCIBILITY 


To assess the enhanced log and its ability to be reproduced, the authors of this study carried out a case study of 
Villa Savoye, designed by Le Corbusier. Using the BIM log from the authoring process, a reproduced BIM model 
was generated using the reproduction algorithm, as depicted in part (a) of Fig. 5. 


Original BIM model Reproduced BIM model Original BIM model Reproduced BIM model 
(a) BIM models (b) 3D views 


Misrepresented geometry Misguided dependant elements Mishandled element overlaps 


—— Original BIM model —— Reproduced BIM model 
(c) Types of misrepresentation 


Fig. 5: Comparison between the original BIM model and reproduced BIM model. 


The BIM model comprised 158 elements, which included 97 walls, 8 slabs, 8 windows, 19 doors, and 27 columns, 
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with 2,836 events logged during the process. The reproduced model captured every element present in the original. 
A comparison revealed minimal average distances between the volumetric centers of the elements: 3.6440E-07 for 
walls, 2.1470E-07 for floors, 1.4760E-07 for windows, 1.6139E-07 for doors, and 1.4380E-07 for columns. 
Volume differences were also analyzed, with disparities noted as 0.1876% for walls, 0.0198% for floors, and 
0.0433% for columns, with no discrepancies for windows and doors. The variations in distance and volume 
displayed a near-perfect reproduction. It is postulated that such differences may arise from metric and imperial 
unit conversions. 


The models were contrasted visually in 3D, along with elevation, section, and plan views. The manual analysis 
allowed the authors to pinpoint inaccuracies in the reproduced model. As highlighted by the red ellipses in part (b) 
of Fig. 5, the curtain wall profiles were inaccurately depicted. Part (c) of Fig. 5 overlays drawings from the original 
and reproduced BIM models, delineated by black and red lines respectively. The comparison revealed three main 
types of inaccuracies: 


Custom attributes: In this version of the implementation, custom attributes, such as distances between vertical 
mullions, were not supported. This omission led to mullions in the reproduced model being placed with default 
values. The issue was observed in four curtain wall elements. 


Automatically connected elements: In instances where building elements overlapped, their placements varied 
occasionally. The function in Revit that automatically joins closely placed elements generates minor discrepancies 
in the lengths of the automatically joined elements, depending on how far apart the joined elements were initially. 
The issue was observed in two interior wall elements. 


Unknown reasons: There were instances where the wall elements with specific profiles did not align precisely 
with the original design. These discrepancies may be attributed to limitations within the logging and reproduction 
algorithm. This issue was observed in six profile wall elements. 


In summary, 12 of the 159 elements (or 92.45%) in the reproduced BIM models were inaccurately represented—all 
being wall elements. Despite these discrepancies, the enhanced BIM logger effectively logged most of the 
necessary information to recreate the BIM authoring process. 


7. CONCLUSION 


This study developed an enhanced BIM logger that captures the necessary information to reproduce the BIM 
authoring process. By analyzing the information requirements of five representative building elements in Revit, 
the authors developed a custom logger that records geometric shapes and attribute values. The study also developed 
a reproduction algorithm to repeat the BIM authoring process. The effectiveness of the enhanced log was evaluated 
in a case study of Villa Savoye designed by Le Corbusier, which showed that the enhanced BIM logger provides 
a valuable tool for capturing and reproducing 92.45% of the building elements generated and modified within the 
BIM authoring process. While minor discrepancies and misrepresentations were observed, the results of the case 
study demonstrated the potential of the enhanced BIM logger. 


Looking ahead, avenues for improvement lie in broadening the spectrum of building element categories or attribute 
values to augment reproducibility. Additionally, the applications of enhanced BIM logs beckon exploration, as 
does the prospect of analyzing the as-happened BIM authoring process (Shin et al., 2022; Yarmohammadi & 
Castro-Lacouture, 2018) and automating the BIM authoring process using such logs (Pan & Zhang, 2020). Overall, 
the enhanced BIM logger presented in this study can contribute to elevating the transparency and efficiency of the 
BIM authoring process, serving as an invaluable data source for its enhancement. 
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ABSTRACT: While the advantages of leveraging advanced technologies and Industry 4.0 for effective safety 
management have been extensively recognized, the journey towards a more mature integration of Industry 4.0 
technologies into safety management practices often lacks a well-defined and systematic guidance map. This 
research is a step towards providing organizations with a structured approach to navigate and achieve successful 
safety management transformation, enabling them to fully harness the potential of industry 4.0 technologies in the 
workplace. Two rounds of systematic literature reviews (SLRs) are conducted to narrow down the number of 
articles based on the PRISMA method, which are then subjected to further content analysis. The study highlights 
the integration of Industry 4.0 technologies within the domains of People, Process, and Policy and their 
significance in advancing safety maturity. This research uncovered key themes, providing valuable insights that 
will shape the conceptual maturity model structure of safety management based on the innovative nature of 
Industry 4.0 to enhance their safety culture to align with. The results provide a fertile ground for a Smart Safety 
Maturity Model, to integrate technologies to elevate safety drivers in construction safety management. 


KEYWORDS: Industry 4.0, Maturity model, safety management, construction industry 
1. INTRODUCTION 


The fourth industrial revolution, also known as Industry 4.0, in the construction industry, has been conceptualized 
as the application of innovative technologies and processes to improve the deliverance of tangible and intangible 
services within construction companies (Kumar et al., 2019). Construction contractors are adopting various 
technologies including robotics, advanced data analysis, immersive technologies, additive manufacturing, 
autonomous systems, cloud computing, cybersecurity, and the Internet of Things (Nnaji et al., 2019; RüBmann et 
al., 2015; Smallwood & Allen, 2023). Although the construction sector increasingly adopts innovative 
technologies that go beyond conventional practices to address occupational Safety and health (OSH) constraints, 
there is no paved avenue for measuring their progress, benchmarking against standards and regulations, and 
pinpointing areas where efforts should be focused to achieve effective outcomes. 

Maturity models (MMs) offer a structured transformational roadmap for organizations adopting new strategies like 
Industry 4.0 technologies for safety objectives and evaluate the current maturity level and plan for improved future 
performance (Alankarage et al., 2022; Paulk, 1995; Wendler, 2012). The structure of MMs is commonly organized 
into five stages or levels: initial, repeatable, defined, managed, and optimizing (Das et al., 2023; Rashidian, 
Drogemuller, Omrani, et al., 2023). The MMs also have a series of attributes and sub-attributes, mainly covering 
hard (technology-related aspects) and soft (human-related aspects) attributes (Rashidian, Drogemuller, & Omrani, 
2023). However, their application has since expanded to various disciplines, including the construction industry 
(Rashidian et al., 2022). The current study heavily relies on using established Safety MMs available in the literature 
to understand the digital readiness in the existing safety MMs. The review of the MMs in the construction field by 
Rashidian et al. (2022) revealed that safety maturity is one of the key focus areas, covering two major themes 
safety culture and safety climate (Wilson & Koehn, 2000). Safety culture refers to the underlying beliefs and values 
that influence organizational behavior, while safety climate pertains to the attitudes and views of the workforce at 
a particular moment Griffin and Curcuruto (2016). While safety maturity models offer a structured approach for 
evaluating organizational safety progression, there exists a gap in understanding how the integration of 
technologies can effectively elevate safety maturity within construction processes. This gap raises the question of 
how technology can be strategically employed to benchmark and enhance safety practices. By addressing this 
research problem, organizations can better understand the synergistic relationship between technology adoption 
and safety progression, leading to optimized safety practices, improved risk mitigation, and ultimately safer 
working environments. The overarching aim of this study is to comprehensively analyze the correlation between 
Industry 4.0 technologies and the attributes of a construction safety maturity model. This involves a dual focus: 
firstly, to pinpoint the precise Industry 4.0 technologies that play a role in enhancing construction safety; and 
secondly, to conduct an exhaustive investigation into how these technologies are integrated within the existing 
Safety Maturity Models (MMs). By delving into the intricate relationship between Industry 4.0 technologies and 
construction safety maturity attributes, this study enhances the knowledge base surrounding the potential benefits 
and challenges of adopting Industry 4.0 technologies. It provides organizations with valuable insights into the 
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ways in which technology can be harnessed to benchmark and enhance safety practices, ultimately leading to more 
effective risk mitigation strategies. 


2. RESEARCH METHODOLOGY 


A systematic literature search was conducted. Title, abstract, and keywords of peer-reviewed articles published 
from 2013 onwards. Our search was conducted exclusively within the cross-disciplinary database Scopus, which 
is recognized as the largest archive of peer-reviewed publications, including scientific reviews, collections of 
academic works, and conference proceedings (Hijazi et al., 2021). The identification phase of the PRISMA model 
was used as a framework for extracting relevant publications (Moher et al., 2009). The primary criteria for the 
initial filtering stage were defined, encompassing factors such as English language, academic relevance, and full- 
text accessibility. Subsequently, a preliminary assessment of content was carried out to ensure the presence of the 
required keywords within the titles and abstracts of the publications. Additionally, a reverse search approach was 
employed during this phase of searching and screening. Baker et al. (2023) mention to the approach as the 
"snowballing" technique which allows the acquisition of papers utilizing cross-references from the selected 
publications to minimize the impact of missing relevant resources. The systematic literature reviews (SLRs) were 
conducted through a two-phase approach, which is detailed in the subsequent subsections. 


2.1 phase 1 


We aimed to identify the utilization of maturity models in the realm of construction safety management. Following 
the removal of irrelevant articles, the total count of articles has been reduced to 11. Table 1 demonstrates how the 
connection between Boolean operators and the result of the search is represented in order to locate pertinent 
scientific articles. 


Table 1: The PRISMA Identification stage results from searching the database (SCOPUS) 


Search 1 Search Results 
Boolean operators (Searches done in July 2023) Scopus 
((“maturity model" OR "maturity framework") AND ( "safety" OR "safety management" OR "risk" 29 


OR "risk management" OR "hazard" OR "accident" OR "accident prediction" OR "accident 
prevention" ) AND ("Construction site" OR "construction jobsite" OR "construction work zone" OR 
"construction industry" OR "construction workplace" OR "construction work*" OR "construction 
professional*" OR "construction labo*" OR "construction workforce*" OR "construction staff" OR 
"construction personnel*" OR "construction activit*" )) 


2.2 phase 2 


Phase 2 is dedicated to retrieving papers associated with Industry 4.0 and safety management. The Scopus database 
yielded a total of 48 papers as shown in Table 2. Following a thorough evaluation of extracted papers, 26 papers 
were excluded based on the content of abstracts, as they were identified as irrelevant to the scope of the research. 
Then, a content analysis approach was employed to categorize highly interconnected terms, relying on researchers' 
semantic comprehension, to identify the various dimensions associated with them (Das et al., 2022). Due to the 
absence of a universally accepted definition of Industry 4.0, there is often a reliance on related terms to describe 
this paradigm (Das et al., 2022). Incorporating the keywords associated with each Industry 4.0 technology in the 
search led to a significant increase in the number of articles retrieved. Furthermore, a considerable portion of the 
papers examined did not propose dimensions that could be compared to existing safety maturity models. To ensure 
important references were not missed, manual searches of the references in the included studies and review articles 
were performed in addition to the electronic searches. Ultimately, through the process of reference tracking, an 
additional seven review papers were identified which covered the different applications of technologies in 
construction safety management, resulting in a total of 29 articles included in the review. 
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Table 2: The PRISMA Identification stage results from searching the database (SCOPUS) 


Search 2 Search Results 
Boolean operators (Searches done in July 2023) a 
("industry 4.0" OR "4ir" OR "fourth industrial revolution") AND ( "safety" OR "safety management" OR 48 


"risk" OR "risk management" OR "hazard" OR "accident" OR "accident prediction" OR "accident 
prevention" ) AND ( "Construction site" OR "construction jobsite" OR "construction work zone" OR 
"construction industry" OR "construction workplace" OR "construction work*" OR "construction 
professional*" OR "construction labo*" OR "construction workforce*" OR "construction staff" OR 
"construction personnel*" OR "construction activit*" )) 


As can be seen in Fig 1., by combining the findings from Search | and Search 2, the findings and discussion were 
conducted to encompass the specific requirements of construction safety management while also addressing the 
essential elements needed to effectively navigate the challenges and opportunities presented by Industry 4.0. A 
total of 11 maturity models and 29 papers were curated from the outcomes of Search 1 and Search 2, 
correspondingly. Subsequently, each phase was independently analyzed to identify shared elements between them 
and how Industry 4.0 technologies improve the maturity of construction safety. 


Search 1 


How technologies can contribute to safety maturity in the construction industry. 


Search 2 


Fig 1. The goal of combining the findings from two-phase searches 


3. CONTENT ANALYSIS 


3.1 Publication distribution by safety maturity model attrinutes 


Fig 2. has illustrated the distribution of publications based on their viewpoint on maturity models focusing on 
components. A total of 11 construction safety maturity models were extracted from phase one, each with distinct 
areas of emphasis classified under three main categories: people, process, and policy. The level of emphasis placed 
on regulation and standards maturity within models had the lowest hit when compared to other components. The 
apportionment of maturity models focus proximately reveals that the majority of maturity models' elements in the 
construction sector predominantly concentrate on managing various processes, while the second most prominent 
focus pertains to people as a foundational area in safety maturity models. 
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3.2 Publication distribution by industry 4.0 technologies application 


The allocation of focus in Industry 4.0 technologies in Fig 3. which is extracted from phas two of search indicates 
that an equal proportion of attention, 39% each, is dedicated to both people and processes in the construction 
sector. These two areas are considered equally significant in the application of advanced technologies in 
construction safety management. The level of emphasis on improving regulation and standards through 
technologies was relatively limited comprising only 22% of the overall focus. 


Process 
39% 


Fig 3. Publication distribution by industry 4.0 technologies applications 


3.3 Publication distribution by industry 4.0 technologies type 


Fig 4. in the literature encompasses a representation of the terminologies associated with various technologies. 
Among these, Artificial Intelligence (AI) and Industry 4.0 technologies have emerged as the most prominent and 
frequently referenced terms. The application of these technologies to aid safety challenges related to people, 
processes, and policy promote safety maturity in the construction process. 
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Fig 1. Publication distribution by industry 4.0 technologies type 


3.4 Keywords co-occurrence in safety maturity models: insights from phase one search 


A keyword network provides a visual representation and showcases the interconnectedness and intellectual 
organization of various research themes (Van Eck & Waltman, 2014). There are no universally defined guidelines 
for determining the frequency at which keywords should occur (Wuni et al., 2019). The frequency of the 
occurrence of the keywords as can be seen in Fig 5., among 28 keywords, the most important topics in the domain 
of construction safety maturity models extracted from phase one of search are related to cluster 1 labeled in red on 
the map had 8 members with keywords such as lagging indicators (e.g., job site safety audits, safety training, pre- 
task hazard analysis, and safety incentive program), leading indicators (e.g., recordable injury rate, lost time injury 
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rate, and OSHA citations and fines), contractor selection, and decision making. Cluster 2 in the green region of 
the map had 7 items with keywords such as hazard awareness, hazard identification, safety accident, construction 
industry, occupation safety, safety management, labor, and personnel issues. Safety culture is the most important 
topic which has 18 links with other clusters. 


contra ct@pseid@tion 
leading oxicarg 


Decision making 228)" 2M@picators 
choosing by aflvantages (cba) 


constru@ign safety safeebiture E ey 
AN a safety leadership 


i 
Building Information Modelling se i o Aa 
hazard aiareness aime industry 
Occupatign saber, magement 
hazal id@ptification 
fe VOSviewer 


Fig 2. Co-occurrence of keywords of safety maturity models extracted from phase one of the search 


3.5 Keywords co-occurrence in application of industry 4.0 technologies in construction 
safety: insights from phase two search 


In Fig 6., the first cluster, indicated by the color red, represents the relevant technologies in the field of construction 
safety management such as artificial intelligence, automation, big data, digital twin, internet of things, machine 
learning, etc. The second cluster green shows links between Safety, technologies, and the construction industry, 
and cluster 3 dark blue shows keywords related to Industry 4.0 terms and their relationship to other clusters. The 
purple color cluster demonstrates how safety management has links with specific technologies such as sensor 
technologies, real-time locating systems, and visualization technologies. The last health and Safety, depicted in 
yellow, illustrates the opportunities and challenges arising from the adoption of technologies in the construction 
safety management domain after the fourth industrial revolution. 
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Fig 3. Co-occurrence of keywords of application of industry 4.0 technologies in construction safety 
extracted from phase two search 
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4. DISCUSSION 


The thematic analysis of the drivers in the studied MMs revealed three major themes: Process, People, and Policy. 
This categorization aligns with other related literature, although slight variations exist. For instance, Succar (2010) 
utilized a BIM MM model and classified the drivers into three primary themes: People, Processes, and 
Technologies. Another context, as presented by Orogun and Issa (2022), involved categorizing the drivers in the 
health and safety assessment tool for Sustainable building projects, with the major themes being Building, Process 
and People. These findings emphasised the recurring significance of the Process and People themes in driving 
success in various domains while also highlighting the role of other drivers, such as technology or Buildings, in 
certain contexts. 


4.1 Identification of safety drivers of maturity models in construction safety management 


A body of literature focused on maturity models for Occupational Health and Safety (OHS) within the construction 
sector emphasizes the critical significance of leadership, commitment, engagement, effective communication, 
competence, and well-defined procedures as crucial elements in attaining maturity in this domain (Musonda et al., 
2021). Assessing the maturity of Occupational Health and Safety (OHS) in construction projects also relies on the 
indispensable use of information and technology resources, along with facilitated collaboration made possible by 
technology (Poghosyan et al., 2020). Safety maturity also serves as a basis for evaluating safety performance 
(Karakhan et al., 2018; Oswald & Lingard, 2019) as well as disability management (Quaigrain & Issa, 2021). 
Construction frontline leaders play a crucial intermediaries role in transmitting messages between top-level 
management and workers, as well as bridging the gap between office-based and site-based people (Oswald & 
Lingard, 2019). safety leadership is a key factor in assessing causative incident factors and without supporting of 
leaders, workers are unable to effectively advocate for and implement safety behaviors (Indrayana et al., 2022). 
To establish a framework for evaluation and improving such practices, Oswald and Lingard (2019) developed a 
three-stage maturity model for revealing the relationships between foremen and subcontractor supervisors, the 
leadership styles of foremen and supervisors, the relationship between foremen and workers, the interaction 
between subcontractor supervisors, effective workgroup communication, and the relationship between frontline 
leaders and H&S advisors. Karakhan et al. (2018) also suggested a decision-making framework to assess the safety 
maturity of construction contractors which has seven main factors including safety leading and lagging indicators, 
Safety and supervisory personnel, system maturity and resiliency, preconstruction services, technology and 
innovation, and safety culture. Albert et al. (2014) suggested a maturity model for hazard recognition of workers 
to assist unanticipated hazardous conditions. A comprehensive framework in the work of Asah-Kissiedu et al. 
(2021) shed light on the various aspects of Safety, health, and environmental (SHE) management in construction 
operations. Table 3 illustrates the coverage of construction safety maturity research, showcasing components from 
different stages of construction safety management. 


Table 3. Safety Drivers of maturity models in construction safety management 


Reference Maturity model Safety drivers Evaluation Evaluation style 
scope 
Ind tal., : j ; People Organization Self-assessment 
(ndrayana et a Frontline H&S leadership maturity 
2022; Oswald & del 
mode 
Lingard, 2019) R 
(Quaigrain & Issa, Disability management Process, Policy Organization Self-assessment 
2021) Performance management 
Construction Hazard Recognition Process, People Construction Self-assessment 
(Albert et al., and Communication with Energy- SEEN 
2014) Based Cognitive Mnemonics and 
Safety Meeting Maturity Model 
(Asah-Kissiedu et Safety, health, and environmental Process, People Organization Self-assessment 
al., 2021) management capability maturity 
mode (iSHEM-CMM) 
(Olugboyega & Building information modeling— Process, People, policy Organization Self-assessment 


Windapo, 2019) enabled construction safety culture 
and maturity model: A grounded 
theory approach 
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(Lee, 2019) Safety culture maturity model Process, People, policy Construction Self-assessment 


Site 
(Poghosyan et al., Design for occupational Safety and Process, People, policy Organization Self-assessment 
2020) Health: A capability maturity 
model 
(Orogun & Issa, Evaluation of the health and Safety Process, policy Project Self-assessment 
2022) of sustainable building projects 
(Karakhan et al., Safety maturity model of Process, People Organization Self-assessment 
2018) contractors 


4.2 Application of industry 4.0 technologies in construction safety management 


Numerous investigations have explored the application of technologies in the domain of construction safety 
management (Asadzadeh et al., 2020; Babalola et al., 2023; Fargnoli & Lombardi, 2020; Sadeghi et al., 2021; 
Soltanmohammadlotu et al., 2019). These investigations delve into various facets of the safety management process 
such as visualizing construction activities and hazards, data gathering, integrating health and Safety into 
construction activities, monitoring noncompliance, determining accident costs, linking requirements to 
construction activities, integrating health and Safety into the construction process, connecting information to 
construction activities, mitigating worker hazards, assessing health and safety costs, and monitoring health and 
Safety in construction processes and activities (Smallwood & Allen, 2023). According to Khodabakhshian et al. 
(2023), the correlation of big data, digital technologies, and artificial intelligence paves the road for covering 
various aspects of construction safety management. Based on extracted data in the content analysis section, there 
are three key focal areas of studies about the application of Industry 4.0 technologies in construction safety 
management. 


4.3 Industry 4.0 technological solutions for process-related safety challenges 


The integration of various technologies has forged paths toward embracing the safety challenges in the 
construction process (Statsenko, Samaraweera, Bakhshi, & Chileshe, 2022). The prevailing technology-driven 
solutions encompass a range of methods, including automating safety planning (such as job hazard analysis, safe 
work method statements, plan and design review, and organisational safety performance) through visualization 
technologies. Sensor-based location technologies play a significant role, as does on-site safety management 
(including optimizing safety processes, proactively preventing accidents, and enhancing the repository of safety 
knowledge), achieved, for example, through the integration of Big Data with Building Information modeling 
(BIM) or semantic web technology. Additionally, safety training, safety outcomes assessment, safety monitoring, 
safety program costs, and real-time hazard identification are facilitated through the utilization of technologies such 
as real-time location tracking, augmented reality (AR), and virtual reality (VR)(Pedro, Pham-Hang, Nguyen, & 
Pham, 2022)(Akanmu et al., 2021; Asadzadeh et al., 2020; Babalola et al., 2023; Fargnoli & Lombardi, 2020; 
Franco et al., 2022; Guo et al., 2017; Li et al., 2018; Malomane et al., 2022; Oke & Arowolya, 2022; Perrier et al., 
2020; Regona et al., 2022; Smallwood & Allen, 2023; Soltanmohammadlou et al., 2019; Statsenko et al., 2022; 
Wen & Gheisari, 2020). 


4.4 Industry 4.0 technological solutions for people-related safety challenges 


Industry 4.0 technologies find application in various domains of safety maturity elements, aiding in multiple areas. 
These include recognizing unsafe behavior, detecting worker hazardous motions, monitoring physiological 
indicators, capturing worker responses, facilitating communication-based safety, assessing worker capabilities, 
providing operator aids, enhancing the safety management system of main contractors, qualifying manufacturers, 
supervising main contractor site activities, nurturing workers' safety values, fostering safety culture, evaluating 
safety climate, and managing worker job stress. These domains benefit from the integration of Industry 4.0 
technologies, including real-time locating systems, visualization technology, Internet of Things (IoT), wearable 
technology, and etc. (Awolusi et al., 2018; Fang et al., 2020; Franco et al., 2022; Guo et al., 2017; Malomane et 
al., 2022; Panteli et al., 2020; Sadeghi et al., 2021; Soltanmohammadlou et al., 2019; Statsenko et al., 2022). 


4.5 Industry 4.0 technological solutions for policy-related safety challenges 
Health and safety noncompliance, safety standards requirements, adherence to site safety rules, evaluation of 
equipment operators' compliance, and compliance with safety regulations form key areas of Industry 4.0 


integration. This integration is facilitated through diverse technologies like digital twins, 4D models, real-time 
locating systems, Internet of Things (IoT) and etc within the framework of safety policies (Fargnoli & Lombardi, 
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2020; Franco et al., 2022; Malomane et al., 2022; Oke & Arowoiya, 2022; Panteli et al., 2020; Patrucco et al., 
2020; Sadeghi et al., 2021). 


4.6 Conclusion and future work 


This research aims to determine how the identified Industry 4.0 technologies currently contribute to safety maturity 
models in the construction industry. The key findings of this research reveal a structured approach to advancing 
safety maturity within the construction industry in the context of Industry 4.0. Our findings highlight the paramount 
importance of integrating Industry 4.0 technologies within the realms of Process,People, and Policy to advance 
safety maturity. Additionally, it is essential to explore how these technologies offer solutions for challenges 
associated with processes, people, and policies in safety management.This categorization is mirrored in the 
classification of Industry 4.0 technology implementation, illustrating a direct alignment between safety progression 
and the adoption of Industry 4.0 technologies. The aim is to systematically organize the transformation of 
construction safety maturity models within the context of Industry 4.0 by identifying critical aspects of the 
contribution of technologies. Furthermore, the research underlines the pivotal role of technology, emphasizing its 
multifaceted contribution to enhancing safety practices and overall maturity within the construction sector. 

In the existence of construction safety maturity model, there is no structured way to measure the contribution of 
technologies to attributes of safety maturity. The utilization of technology in contributing to the maturity level of 
safety in the construction process can also indirectly provide advantages to policymakers and government bodies 
to present the state of safety management in the construction industry in their region and establish specific goals 
and regulations accordingly. However, most Industry 4.0 technologies in the domain of construction safety 
management are immature and have not yet been comprehensively developed. This study paved the way for 
developing a Smart Safety Maturity Model which integrates technologies for improving safety in the construction 
process. 
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ABSTRACT: The development of methods for building permit issuing supported by digital tools could improve 
the current mostly manual procedures for processing regulatory information and related compliance processes. 
Several studies are currently addressing the challenge of building permit digitalisation, mostly considering 
building information models as the source data for automating the regulations checks. However, many of the main 
checks, that usually represent the major bottlenecks of the compliance checking process, need a joint 
representation of the new proposed construction and its context, which could be effectively represented in a (3D) 
geographical information system. This study aims at supporting the automation of building permitting by 
addressing the rule interpretation as an input to model preparation and code checking. In particular, the 
regulations interpretation in this case is functional to the definition of data requirements and checking rules 
referring to a joint GIS and BIM (GeoBIM) framework. The approach is developed and tested in the case of an 
Italian municipality of 45.000 inhabitants. This paper describes the interpretation of distance-related regulations 
by adopting a semantic mark-up and sentence-centric approach. The resulting level of information need has been 
represented in conceptual models (object, attributes, relationships) as an essential input to city and building model 
preparation. While the case study is specific in location and regulations, the type of issues encountered are a 
generally applicable example for the building permit use case. Future works will extend the methodology to 
additional three European municipalities between 45.000 and 1.000.000 inhabitants, in three European countries, 
to address the need for a flexible and scalable approach. 


KEYWORDS: Digital building permit; Rule interpretation; GeoBIM; Information requirements; Building-urban 
interaction. 


1. INTRODUCTION 


A building permit is an authorisation required to start the construction phase of a building and it is granted by 
public authorities after verifying that the design proposal complies with construction regulations at building and 
urban levels (Noardo, et al. 2022). Building permit checks are traditionally a time-consuming and manual 
procedure for municipalities and the process is recognised as poorly effective due to multiple factors including, 
but not limited to, the technical knowledge of public officers in the assessment and the high demand for building 
permits to be inspected, troubled by the lack of adequate personnel (Fauth & Soibelman, 2022). Also, procedures 
for the issuance of the permit tend to be complex because of having to adapt to frequent legislative updates 
(Malsane, et al. 2015). 


The development and connection of methods for building permit issuing supported by digital tools could improve 
the current as-is manual procedures for processing regulatory information and related compliance processes. With 
the increased adoption of Building Information Models (BIM) in building design processes, several municipalities 
are investing in automating these checks both by using BIM methods and tools, but also by increasingly integrating 
them with the geographic datasets at their disposal (Hobeika, et al. 2022). Research in the adoption of BIM to 
design verification for regulatory compliance is not recent - think of the discussion on BIM-based rule checking 
proposed by Eastman, ef al. (2009). However, until a few years ago, research works were mainly focused on 
analysing building data, including their conversion in building-related neutral data schema as Industry Foundation 
Classes (IFC), rather than integrating BIM and Geographic Information System (GIS) (i.e., GeoBIM) (Hobeika, 
et al. 2022). 


It is clear how important checks for the issuing of building permits require constant interaction between data on 
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the building for which permission to proceed with the construction phase is sought — to be retrieved in the building 
information model - and information about the urban context in which it is located — to be retrieved in geo-data 
sets or 3D city models. This means that without adopting a GeoBIM approach some checks would require users 
to manually add information that should be instead automatically extracted from geo-data sets. For example, many 
urban parameters depend on urban zones, a new building must comply with minimum distance criteria from 
existing ones, interaction with existing public or private facilities must be considered, as well as public transport, 
parking standards and so on (Hobeika, et al. 2022). For this reason, according to recent studies on digital building 
permit (DBP), the automation of the checking of those kinds of regulations can only be effective if both geo-data 
and building data are considered. At present, the adoption of a GeoBIM approach represents a significant challenge 
of the DBP use case (Arroyo Ohori, et al. 2018). Recent literature, especially from 2016 (Noardo, et al. 2022), 
investigate the management of geo-data by available standards (e.g., CityGML) (Guler & Yomralioglu, 2021) and 
GeoBIM interoperability (e.g., conversion of 3D city models to BIM) (Nebras, et al. 2020) as essential steps for 
allowing designers to consider geoinformation as a suitable reference (Noardo, et al. 2022). 


This paper focuses on rule interpretation, the process of conversion of the natural language of city and building 
regulations into computable parameters and constraints. Within the broader research framework, the methodology 
adopted for the interpretation of distance-related regulatory requirements and their formalisation is described in 
Section 2. Distance-related checks are one of the examples of regulatory checks that need the adoption of an 
integrated GeoBIM approach to be automated. Results from the interpretation of regulatory requirements from a 
specific case study are described in Section 3, including the formalisation of the relevant level of information need 
as a preliminary input to city and building model preparation and code checking (Section 4). Finally, limitations 
of the study, which is currently ongoing, are discussed along with future contributions (Section 5). 


2. RESEARCH FRAMEWORK AND METHODOLOGY 


The research considers the digitisation of the building permit use case for an Italian municipality of 45.000 
inhabitants (i.e., Municipality of Ascoli Piceno). It has been explored in close collaboration with the municipality, 
and it is based on the field experience of its own officers. First, the list of checks to be digitised for effectively 
supporting the building permit process has been defined. Then, regulations among the ones that were deemed 
likely to have the best advantage from GeoBIM have been considered as a priority by municipality officers. They 
include: maximum buildable urban volume, buildability index, covered area, coverage ratio, maximum building 
height, building protrusions on public streets and squares, parcel's fence height, distance from other buildings (i.e., 
building-building distance), distance from the parcel boundaries (i.e., building-parcel boundaries distance), 
building-road distance, parking standards (dimensions/area and n. of parking spaces). 


In this paper the results regarding distance-related regulations are described. Those regulations were selected, in 
consultation with municipality experts, because these were judged to be among the most important ones to be 
implemented. Distance-related regulations have been classified as follows: building-building distance, building- 
parcel boundaries distance and building-building distance if a road is interposed. The specific text of the regulation 
has been analysed with the aim of translating it into a set of information requirements for the DBP use case and, 
on the other hand, into a machine-readable format. The former objective lies within the scope of this paper, while 
the development of a pseudocode, developed based on this initial discussion, will be a future work of this research 
which is still ongoing. 


2.1 Rule interpretation 


Rule interpretation represents the first step of the DBP workflow (Figure 1) but it is also one of the main challenges 
in such a use case. The relevant information in documents such as public laws, codes and regulative standards 
should be captured in a time and cost-effective way to be able to adopt rule checking effectively (Noardo, et al. 
2022). However, the complexity of the natural language used for regulatory requirements and its interpretation 
into data sets for supporting the adoption of a digital approach represents an open issue for automating and 
digitising design compliance checking and, specifically, building permitting. Malsane, et al. (2015) describe how 
knowledge formalisation of building codes could provide “suitable, significant and required data for the 
development of the building regulation-specific object modelling”. They claimed how the formalisation of building 
regulations should include the classification of regulation clauses into “those which are computer-interpretable 
(declarative) and those which are not (informative)”. The former provides a direct meaning to be interpreted (e.g., 
simple geometrical rules which when applied to an element can return true or false), while the latter contain data 
only partially suitable for interpretation into computer rules that can be processed (e.g., information is not obvious 
as checkable, needs human interpretation to understand the exact content and meaning). Finally, a remaining 
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category of clauses exist that can be considered as unsuitable for automated compliance checking. 
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Fig. 1: Overview of considered DBP workflow steps assigned to involved parties (Noardo, et al. 2022) 


Several rule interpretation processes are described in literature, both manual-based and automatically enabled, to 
create computable representations of normative data (Dimyadi, ef al. 2017). Studies exist that propose methods to 
automatically interpret the natural language of regulations transforming them to code to check the proposed 
building design, as represented in building information models (e.g., Song, et al. 2018; Zhang & El-Gohary, 2017). 
However, challenges appear long before the conversion of this information from natural language to any formal 
language since even for human beings the interpretation of the text, tables and graphical contents of regulations 
can be open to different interpretations (Noardo, et al. 2020). According to Zhang, et al. (2023), the ambiguity of 
building requirements is one of the main issues since it could prevent their accurate interpretation and automated 
checking. They discussed how some ambiguous clauses in building requirements “reflect regulators’ intention 
while others are unintentional, resulting from the use of language or tacit knowledge”. Even if the rule 
interpretation process could rely on the programmer’s interpretation and translation of the written rules into 
computer code (Eastman, et al. 2009), in most cases the logic of the human language statements is first formally 
interpreted and then translated. In fact, because of the complexity and subjectivity in nature of building regulations, 
as just mentioned, building regulation experts (e.g., municipality officers) need to be involved in their conversion 
to computer interpretable rules to ensure the correct interpretations (Malsane, et al. 2015). 


In this study, a sentence-centric approach and a semantic mark-up procedure have been adopted for rule 
interpretation. Each regulatory article that municipality officers considered as significant for distance-related 
checks was interpreted (i.e., sentence-centric approach). To this end the Requirement, Applicability, Selection, and 
Exception (RASE) methodology has been adopted for deconstructing rule sentences and to extract semantics from 
building regulations for compliance checking (Hjelseth & Nisbet, 2011; Nisbet, et al. 2022). The content of 
regulatory requirements has been interpreted by dividing each sentence into these four basic components to support 
formalisation (i.e., semantic mark-up procedure). Such an approach to rule interpretation can be seen as a pre- 
processing step as in this case the result cannot be directly interpreted by the computer (Preidel & Borrmann, 2018). 


Moreover, a relevant step to solve ambiguities has been performed by the organisation of a specific meeting with 
the involvement of municipality officers who usually check and then release or deny the building permits 
themselves. The details and meanings of the distance-related regulations were explained and agreed upon and 
specific questions were asked about ambiguous statements. Moreover, some more details arising during the 
following work towards implementation were asked later. That was the only way to have ambiguities about the 
regulation solved. This issue already showed how little the current state of regulations lends itself to automation 
(Noardo, et al. 2020). 


After that, the regulation was formalised step by step. A table containing the metric phrases identified through the 
RASE labelling has been compiled and the following data set has been provided (Hjelseth & Nisbet, 2011; 
Tomczak, et al. 2022): the object to which the rule refers, the type of information to be verified (i.e., property), the 
data type (e.g., text, number, boolean) of the required properties, the comparison (e.g., <, >, contains), the value to 
be compared (i.e., target value), the unit of measurement that the value should have (i.e., for numerical values) 
and, if necessary, dependencies and application conditions. Plain text description could be added to support 
unambiguous definitions and information requirements could also demand a particular level of detailedness to 
support building and city model preparation, meaning what needs to be modelled and to what precision (i.e., 
geometrical data) (Tomczak, et al. 2022). 
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2.2 Information conceptual models 


Based on the results from rule interpretation, information requirements have been identified and formalised in 
terms of information conceptual models (object, attributes, relationships) as an essential input to city and building 
model preparation (Zhang, et al. 2023). Both building and city models are sources of information involved in 
automated compliance audit processes if a GeoBIM approach is adopted. They respectively represent the building 
design to be audit - usually developed by the professional responsible for applying for a building permit on behalf 
of the applicant - and the urban context in which it is located. Another source in the automated compliance checking 
process is represented by normative clauses structured in machine-readable formats, which is not within the scope 
of this paper. 


Conceptual models, in the practice of database design, are intended to formally represent the information to be 
stored in a database. The database design methodology defined by the ANSI/X3/SPARC standard defines four 
levels of data modelling: external model, representing a simple narration of requirements for information 
representation of the database; conceptual model, abstracting and formalising the information requirements in the 
external model into objects (i.e., entities or classes), the respective properties or characteristics (attributes) and 
reciprocal relationships; logical model, converting the conceptual model into rules more computing oriented; and 
finally the internal, or physical model, corresponding to the actual implementation format of the database (Laurini 
& Thompson, 1992). 


In this study, the information about requirements was collected from the regulations, and the mark-up phase 
through the RASE methodology facilitating the formalisation of the regulatory contents was one intermediate step 
supporting the definition of data requirements as conceptual models, which were represented following the Unified 
Modelling Language (UML) (OMG, 2023). Objects and related attributes needed for distance-related checks in 
the DBP use case for the considered municipality have been identified. The representation includes data types and 
specifies if data are directly extractable from the model (e.g., the height of the object) or whether they need to be 
entered manually by the designer. For numerical value the type is specified (e.g., integer, real, float), while for 
textual data allowed classifications are specified from which data values can be chosen (e.g., the list of actual 
intended uses for the municipality’s urban zone from which a designer has to choose before submitting its design 
to the building permit procedure. Depending on the type of intended use of the urban zone, in fact, some regulatory 
constraints will apply over others). The conceptual model representation also allows the type of relationship 
(aggregation relationship, composition relationship, etc.) to be specified. 


Finally, the development of the conceptual models supports an easier comparison and harmonisation of the 
information requirements according to the same type of check (e.g., distance-related check) applied in different 
municipalities or according to different types of checks applied by the same municipality. This will facilitate the 
mapping to standards for representing BIM and GIS-related information consistently in future steps. To this end, 
as a preliminary step, for the objects in the conceptual model proposed in this paper it is specified whether they 
should belong to the BIM or GIS information representation. 


3. RESULTS FROM THE INTERPRETATION OF REGULATORY REQUIREMENTS 


The regulation considered here is the Regolamento edilizio comunale (i.e., Building code) of the Municipality of 
Ascoli Piceno. Municipality officers pointed out Article 61 of the text, reported here in paragraphs 2, 3, 4 and 6, 
as important for distance-related checks. The text translated into the English language is available in Table 1. 


When considering this Article for formalisation, several examples of the complexity of the natural language of 
regulatory requirements emerges. For example, paragraph 2 refers to another regulatory text whose requirements, 
in relation to the one under analysis, are not made explicit. A reference to the “urban planning instrument”, which 
could contain additional requirements, is also mentioned in the paragraph. The prescribed minimum distance 
between two buildings is implicit in the text — referring to the “height of the tallest building” - and refers to another 
regulatory aspect concerning the maximum building height, for the explication of which it is necessary to refer to 
further definitions or regulatory articles. The same happens for the distance of the building from the boundaries of 
the parcel in which it is located (1.e., “the distance of a building from the parcel boundaries shall be equal to the 
half of the maximum permitted height”). Paragraph 4 also refers to additional definitions, such as the one of the 
distances of a building from a road: what does the road should contain as an object in city or building information 
models? 
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It is therefore essential to interpret not only the regulatory requirements but also definitions contained in building 
regulations and other regulatory texts that are relevant to the inspection in question. In 2016, an agreement was 
reached between the Italian state, Regions and the National Association of Italian Municipalities to adopt the so- 
called Standardized Building Regulations to simplify and unify actions in building matters. To this end, a set of 
forty-two uniform building-urban definitions was also developed, which represents the common glossary valid 
throughout the country. However, the process of transposing the Standard Building Regulations and their 
homogeneous definitions is still in progress and it is currently not possible to proceed with an unambiguous 
interpretation of them that has generalised validity. In this case, the regulation considered here is the Regolamento 
edilizio comunale of the Municipality of Ascoli Piceno — art. 13 (o, p, q). The text of these definitions translated 
into the English language is contained in Table 2, which also contains the graphical interpretation of the definitions 
as proposed by the authors and validated by municipality officers. The definition of building height (art. 13, 
paragraph m and n) had to be considered as well and it is contained in Table 3. 


Table 1: Building code of the Municipality of Ascoli Piceno. Art. 61 (i.e., distance-related checks) 


Paragraph Text of Normative Article 


2 In (c) areas of expansion referred to in Article 2 of Ministerial Decree No 1444 of 2 April 1968, published in the Official 
Gazette of 16 April 1968, No 97 between windowed walls of facing buildings, a minimum distance is prescribed, equal to 
the height of the tallest building and no lower than 10 m; if the facing facades overlap for more than 12 meters, the rule 
applies also when only one of them has windows. In the same urban zones, the distance of a building from the parcel 
boundaries shall be equal to the half of the maximum permitted height and in any case not less than 5 m. Construction on 
parcel boundaries is allowed, where permitted by the urban planning instrument, by agreement between the neighboring 


owners 


3 For all construction operations in other areas, the following minimum distances are prescribed: (building-building distance) 
(1) between windowed walls and walls of buildings in front of which at least one window: ml. 10; (building-parcel 
boundaries distance) (2) from the boundaries: ml.5 and unless otherwise prescribed by the general urban planning 


instrument. 


4 Minimum distance between buildings with roads in-between, excluding cul-de-sac roads serving single buildings or 
settlements, must be equal to the road width plus: (1) 5.00 m per side, in case of streets width lower than 7.00 m, (2) 7.50 
m per side, in case of streets width between 7.00 and 15.00 m, (3) 10.00 m per side, in case of streets width higher than 
15.00 m 


6 It will not be taken into account for the purposes of determining distances, overhang structures such as steps and open 
external stairs (maximum height 4), (1) gutter frames, open balconies and canopies, provided that the relative outline 
remains spaced from the boundaries at least by 1.50 m; (2) While account shall be taken of anybody closed in protrusion 


whatever is the adjective and at whatever height of the building it begins. 


Several ambiguities and uncertainties also arise for human interpretation. Those were mainly solved with the help 
of municipality officers in a meeting organised on the 17" of May 2023. Later, further ambiguities have been 
solved thanks to the continuous collaboration with the municipality, which was critical for the success of this initial 
phase. For example, the building-building distance can be considered radially or perpendicularly to the new 
building. Through discussions with municipality officers, it was defined that the segment defining the distance is 
perpendicular to the line of the building (see Table 2, paragraph o). Considering the building-road distance check, 
doubts emerged in relation to the “road furniture areas”. Talking with the municipality officers it was possible to 
define the entity of “flowerbed” to which the normative text refers (see Table 2, paragraph q). 


Results from rule interpretation have been formalised in a table containing the metric phrases identified through 
the RASE labelling as described in Section 2 (Figure 2). In Figure 2, the identified values for distances only apply 
to one urban zone, namely the c) areas of expansion. The first part of the article identifies the distance between 
two buildings with external windowed walls (art. 61(2) — case a), while in the second part, the overlap factor 
intervenes (x >12 m) (art. 61(2) — case b) and the distance is calculated between two walls of which at least one is 
windowed, so we have two cases, the existing building with a wall containing a window and the new building 
without windows or vice versa, the new building with a window and the existing building without (Figure 3). 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


Table 2: Distances-related definitions as defined in the Building code of the Municipality of Ascoli Piceno, Art. 13 


Regulatory requirement Graphical interpretation 


Paragraph o — Building-building distance. Is the distance 
(minimum) between the walls in front of the buildings, or 
buildings of the same, except for the walls on the interior spaces 
referred to in point (r) below, measured at the points of 
maximum protrusion. Two walls are considered to be facing 
when the angle formed by the extension of the same is less than 
70 degrees sexagesimal and the overlap is greater than 1/4 of the 
minimum distance of the walls themselves. For graded 


buildings, the distance is measured at each setback. 


Paragraph p — Building-parcel boundaries distance. It is the 
distance between the vertical projection of the wall of the 
building and the boundary line, measured at the point of 
maximum protrusion. It is understood as a border beyond the 


separation line of the different existing properties or the line 


defining the different lots or compartments of the 
implementation plans, as well as the line of delimitation of 
public areas for services or equipment identified in urban 


planning instruments. 


Paragraph q - Building-building distance if a road is interposed. 
It is the distance between the vertical projection of the wall of 
the building and the edge of the road, including sidewalk and 
public parking and road furniture areas. 


Table 3: Building-height definitions as defined in the Building code of the Municipality of Ascoli Piceno, Art. 13 


Regulatory requirement Graphical interpretation 


Paragraph m - Front height (H) - This is the height of any part of the 
elevation into which the building can be broken down, measured from the 
ground line to the roof line, taking into account the setback bodies if not 
included. The ground line is defined by the intersection of the wall of the 
elevation with the street level or the plane of the pavement or the plane of 
the ground at final settlement. The roof line is defined, in the case of a flat 
roof, by the intersection of the elevation wall with the plane corresponding 
to the extrados of the roof slab; in the case of a pitched roof, by the one 
intersection of the elevation wall with the plane corresponding to the 
extrados of the roof pitch. Unless otherwise specifically prescribed by the 
individual town-planning instruments, the height measurement does not 
take into account stairwells, lifts and flues, nor the increases corresponding 
to basement window wells or external accesses, both vehicular and 
pedestrian, provided that the accesses themselves, built in a trench with 


respect to the ground line, are not more than 3 m wide. 


Paragraph n - Maximum height of buildings (HMAX) This is the 
maximum between the heights of the different parts of the elevation into 
which the building can be divided, measured as in letter m) above. In the 
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case of elevations in which there are inclined roof pitches (gabled, 
staggered or single-pitched), the maximum height is considered to be that 
corresponding to the intersection of the elevation walls with the plane 
corresponding to the extrados of the roof pitch (1) as long as the ridge does 
not exceed the height measured in this way by more than 1.80 m, otherwise 
the maximum height is measured at the ridge line (see Figures 1,2,3 and 
4). (2) If the roof slopes coincide with the sloping walls of the elevations, 
the maximum height must always be measured to the ridge line (see 
Figures 5 and 6). (3) For buildings on land with a natural slope of more 
than 15%, the maximum height permitted by the town planning 
instruments, unless more restrictive prescriptions of the same, may be 
exceeded by 20% in the downstream parts of the elevations, with an 
absolute maximum of 2.00 (see Figure 7). 
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Fig. 3: Graphic Interpretation of Articles 61.2(a) (on the left) and 61.2(b) (on the right). 
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In the analysis of Article 61.3 shown in Figure 4, there are the values of the distances to be observed in 'other 
areas'. This concept was the subject of discussion with the municipality as it does not specify in detail in which 
Urban Zones it applies. Unlike the previous article, the rule to be respected in this case is that the 3 areas emerged 
from the comparison: Historic centre, Areas of completion and Areas of expansion. 
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Fig. 4: Resulting interpretation of Art. 61.3 based on a semantic mark-up and sentence-centric approach. 


Article 61.4 (Figure 5) deals with the distance that must be between the building and the road. First of all, it is 
necessary to identify the width of the road. In fact, the article refers to 3 brackets: less than 7 m (1), between 7 m 
and 15 m (2) and greater than 15 m (3). For these 3 cases it identifies the distance as the sum of the road width 
increased by a specific value: 5 m for the first case, 7.50 m for the second case and 10 m for the third case. Article 
61.6 (Figure 6) deals with the distance between the projecting parts of the building and the parcel boundaries. A 
requirement emerges that has been identified by the authors as an 'implicit piece of the regulation'. For the previous 
articles, the building line from which to calculate the distance referred to the external elevation understood as the 
external wall that must contain a window (Art.61.2a). In this article, the reference building line is moved from the 
outer wall to the reference overhang if the latter is less than 1.50 m away from the parcel boundaries. The 
projections referred to are precisely eaves cornices, open balconies and canopies. 
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Fig. 5: Resulting interpretation of Art. 61.4 based on a semantic mark-up and sentence-centric approach. 
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Fig. 6: Resulting interpretation of Art. 61.6 based on a semantic mark-up and sentence-centric approach. 


4. INFORMATION REQUIREMENTS DEFINITION 


As already mentioned in Section 2.2, Figure 7 shows the representation of the data required to verify the constraints 
specified in the Regolamento edilizio comunale of Ascoli Piceno. Each box represents an entity of building 
information models or city models, with its specific attributes. To facilitate the identification of entities belonging 
to the different models, colours were used, assigning a different colour depending on whether the entity should 
come from a BIM or city model. In particular, green was used to identify elements belonging to the building model 
(BIM) and blue to identify elements belonging to the city model (GIS). This distinction allows to have an 
immediate view of where the data should come from and highlights the necessary correlation between the data of 
the building and the geospatial data of the urban context in which it is located. This example makes it clear that 
the need to integrate BIM with GIS, and GIS with BIM, is increasingly essential. A grain size and tolerance must 
be defined to integrate these two types of data. 


Building information models and city models for checking distance-related requirements in a digital building 
permit use case should contain: 


e urban zones with an absolute location and for which the intended use (e.g., area of expansion; historic 
centre) is specified; 
cadastral parcels with an absolute location and for which parcel boundaries are modelled as well; 
existing buildings, meaning those that are already located in the urban context of the buildings for which 
the building permit is required. Existing buildings could be detailed with a simplified geometry that 
allows the extrapolation of the building height according to the relevant definition (e.g., a cube for a house 
with a flat roof that is not accessible). Existing buildings have to contain windows to allow building- 
building distance checks to be executed. Existing buildings have to be identified with their absolute 
location; 

e buildings, meaning the ones for which the building permit is required. Buildings need to be detailed with 
their actual shape and size to identify the building outline. For this reason, buildings need to be modelled 
including overhangs as balconies, canopies, roofs’ eave cornices and closed building extrusions. 
Appearance is not to be considered for this type of check. Moreover, external walls have to include 
windows. The type of construction could be a property to assign to buildings (e.g., new construction, 
renovation) for compliance checking; 
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e roads will have to be modelled including streets, sidewalks, parking spaces and flowerbeds. In fact, 
according to the definition in Table 2 (see paragraph q), the road is considered as the sum of several 
elements. The information content of road elements has to contain properties as type of road (e.g., cul- 
de-sac) and width. 
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Fig. 7: Conceptual model representing the entities, attributes and relationships that must be present in the model 
to apply for a digital building permit. 


5. CONCLUSIONS 
5.1 Discussion of results 


This paper describes the interpretation of distance-related regulations for the digitisation of the building permit 
use case for an Italian municipality of 45.000 inhabitants. Those regulations were selected as a priority by 
municipality officers, and they check the compliance of design proposal with regulatory requirements related to: 
building-building distance, building-parcel boundaries distance and building-building distance if a road is 
interposed. A semantic mark-up and sentence-centric approach has been adopted to extract normative constraints 
from the natural language of the regulatory text with the aim of translating it into a set of information requirements 
for the DBP use case. Information requirements have been defined as an essential input to city and building model 
preparation and they have been formalised in terms of information conceptual models. Domain-specific object 
types have been associated with required properties properly detailed. Moreover, object types have been matched 
with a preliminary categorisation into building-related and city-related objects. This could be the basis for the 
development of standard-oriented specifications for building and city information models. 


The rule interpretation step revealed several difficulties, proving to be, as already described in the existing 
literature, probably the most critical phase in digitising design compliance checking and, specifically, building 
permitting. First, despite the fact that the municipality officers initially indicated only one regulatory article as 
necessary for distance verification based on urban zones (i.e., Art. 61 from Regolamento edilizio comunale), it was 
needed to add further regulatory references to this explicit request, either from the same regulatory text or from 
others normative codes. In addition, it was necessary to interpret not only the regulatory requirements but also the 
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definitions to which they refer (e.g., the actual meaning of building-road distances and maximum building height). 
Moreover, to validate the extrapolated data and to remove ambiguities related to the complex regulatory textual 
form, a comparison with municipality officers was essential, which underlines, again confirming the literature, the 
interpretative subjectivity of building regulations that hardly fits with their digitisation. 


5.2 Limitations and future works 


Current outcomes show how, despite any fascinating narrative about the automated technical solutions, the hurdles 
to be tackled with and overcome imply to definitely understand a lot of multi-faceted meanings, unless a whole 
re-writing of the regulations might be figured out. While the case study is specific in location and regulations, the 
type of issues encountered are a generally applicable example for the building permit use case. Future works will 
extend the methodology to the analysis of distance-related regulations from additional three municipalities 
between 45.000 and 1.000.000 inhabitants, in other two European countries. The resultant level of information 
need will be formalised in terms of information conceptual models for the complete set of four municipalities and 
compared to point out the need for a flexible and scalable approach. To this end, the generated data set will be 
shared with professionals from municipalities, city modellers, and designers to allow the comparison and 
subsequent validation of the identified and analysed requirements. 


Moreover, what is proposed in this article is a first step to proceed, in future works, with deeper analysis in relation 
to semantics, level of details, geometric representations and GeoBIM interoperability. As a future work, the 
representation of design and context information according to BIM and geospatial standards will be considered as 
well as standard definitions for the level of information need as the one proposed by the EN Standard 17412, to be 
soon re-shaped as a EN ISO Standard. 
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ABSTRACT: This article systematically reviews digitalizations impacts on project management in the 
construction industry. The article discusses the challenges the construction industry in successfully applying 
digitalization during the construction phase, including high costs, subpar performance, low productivity, and 
sustainability issues. The article then outlines the research questions and methodology used to conduct the 
systematic review, including selecting acceptable research questions, selecting acceptable research questions, 
using a qualitative approach, and the content review approach. The study analyzed 21 papers and identified the 
primary research themes regarding the effects of digitalization on project management. The study found that digital 
technologies, such as smart building technology, digital twins, reality capture, and laser scanning, have positively 
impacted project management by increasing efficiency, accuracy, value, and safety. The study concludes that the 
construction industry must embrace digitalization to address its challenges and improve project management. 


KEYWORDS: Digitalization, Systematic review, Project Management, BIM, Construction Industry, Automation, 
Innovation, Construction Sector, Innovation, AI, Remote, Architect, Engineering 


1. INTRODUCTION 


Digitalization is changing the way construction projects are managed and impacting not only the design and 
construction phases but also facility management (Bazan et al., 2021). It is challenging to successfully apply 
digitalization during the most time-consuming, expensive, and collaborative period of a project—the construction 
phase (Qais K. Jahanger et al., 2021). With digitalization being the most promising option to address these 
difficulties, the construction industry faces a number of hurdles, including high costs, subpar performance, low 
productivity, and sustainability issues (Nikmehr et al., 2021). 


The implementation of Building Information Modelling (BIM) is one of how the construction industry is being 
digitalized, and its adoption is increasing globally, requiring the conversion of conventional building life cycle 
phases into BIM-integrated project deliveries (Yilmaz, Akcamete and Demirors, 2023). Strong relationships 
between frequently encountered independent BIM uses are a requirement for the Total BIM approach, where the 
principal contractual and legally binding construction document should be production-oriented BIM. Other key 
success elements include cloud-based model administration, user-friendly on-site mobile BIM software, and strong 
leadership (Disney et al., 2022). However, BIM deployment frequently has limitations, such as parallel 2D drawing 
production, which results in wasteful effort, and the need for structural engineers on-site to develop POVs for 
construction because collecting measurements using the software was not supported. Additionally, Tekla, which 
supports a 3D model-based approach for rebar and steel structures, is used in the examples, which largely focus 
on infrastructure projects (Disney et al., 2022). 


To fully utilize BIM and digitalization in project management, it is necessary to develop tools for documentation, 
registration, and data management that allow for effective information sharing among all stakeholders (Bazan et 
al., 2021). Moreover, the cost of construction expert’s services, including BIM services, which are now determined 
by established rates based on construction and equipment prices, can be more accurately predicted with the help 
of historical data (Nguyen, Dang and Nguyen, 2022). Using BIM-Facility Management with three-dimensional 
information models created during construction, BIM-based solutions offer the opportunity for collaborative, 
multidisciplinary workflows and full life cycle consideration (Bazan et al., 2021). 


Apart from BIM, the construction industry is also moving towards Construction Automation and Robotics (CAR) 
(Pons-Valladares et al., 2023). This is because the upkeep and building of conventional structures account for 
roughly 30-40% of global energy usage and greenhouse gas emissions into the built environment, posing a threat 
to the environment due to the significant amount of resources and energy required in these buildings (Ejidike and 
Mewomo, 2023). According to the UN, the construction industry is responsible for around 40% of global energy 
use, nearly 40% of waste production, and roughly 30% of greenhouse gas emissions related to energy (Reinbold 
et al., 2022). 
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As part of industrial digitalization, the manufacturing sector makes use of digital technologies like AI, cloud 
processing, and IoT. This speaks of initiatives taken to increase output using digital technology, which can have 
an effect on all areas of an organization and alter current business procedures (Carlsson, 2023). Effective 
strategizing of organizational capabilities relies on managers' capacity to elucidate the socio-cognitive factors, and 
it contributes to management practice by highlighting the divergent opinions among managers concerning digital 
technology's influence on efforts to improve the organization (Carlsson, 2023). Additionally, a potential solution 
to the lack of focus on workers in digitalization research for the construction industry is digital visual management 
(DVM), which aims to provide real-time visual information that can be easily accessed and utilized during 
production activities to increase transparency and communication. However, understanding the various actors' 
information needs is crucial to effectively implement DVM (Reinbold et al., 2022). 


The growth of digital methods is transforming the design and construction business, impacting not only how the 
sector operates but also the range of output; yet, there is a dearth of critical analysis because the majority of research 
mainly concentrate on creating the approaches (Brooks, Zantinge and Elghaish, 2022). The lack of comprehensive 
analysis of this disaggregated knowledge on the application of digital technology in project management informed 
the current systematic literature review. This study aimed to comprehend how digitalization has affected project 
management. The study objectives are; 


e To examine the project management literature currently available on digitalization. 
e To determine how project management is currently using digitalization. 
e To identify the obstacles preventing the use of digitization in project management 


The building industry faces the challenge of creating comprehensively built infrastructures that meet the demands 
of a growing population, urban sprawl, and globalization. This includes implementing efficient energy 
management, ensuring proper water supply, providing indoor comfort for occupants, and managing construction 
waste (Ejidike and Mewomo, 2023). The substantial financial burden of maintaining and repairing deteriorating 
facilities is highlighted by the necessity for improved management and inspection systems given the size of some 
countries! infrastructure (Bazan et al., 2021). The successful use of digital technology by a construction company 
depends on having sufficient knowledge of the organization, including its structure, nature of work, and human 
resource characteristics (Nikmehr et al., 2021). 


2. RESEARCH METHOD 


The basis for increasing knowledge in a certain discipline, such as architecture and construction, among others, is 
a thorough review of the scientific literature. The study employed a systematic review methodology, in accordance 
with the strategy described by Denyer and Tranfield, to thoroughly assess the body of literature and arrange 
knowledge into a trustworthy manner that can influence practices (Ejidike and Mewomo, 2023). 


This literature review with a qualitative approach aims to explore the advantages of digitalization in project 
management in the construction industry. The review employs the Scopus search engine to locate pertinent papers. 
Due to Scopus's high data recovery accuracy and precision, the researcher chose to use it for their literature search. 
The investigation considers papers published between 2015 and 2022, using the following search strategy: 
(TITLE-ABS-KEY ("Digitalization" AND Construction AND Management)). 


The initial search resulted in 129 articles that met the criteria. Inclusion and exclusion criteria were applied to 
ensure that only relevant papers were selected. The criteria stipulated that the articles must be in English and deal 
with engineering. Any papers that did not meet these criteria were discarded. The review specifically focused on 
digitalization in project management, analyzing pertinent academic journals and conference papers. Despite the 
thorough search, not all of the articles that were found appeared to be pertinent to the study's specific focus. 
Therefore, the researcher used a content review approach to further filter and select the most relevant papers for 
the investigation. 


The content review approach involved a three-step process. First, the researcher examined the article's topical 
relevance to the study. Second, the researcher analyzed the abstract to determine whether the article provided 
sufficient information related to the study's focus. Finally, the researcher reviewed the article's findings as they 
were reported in the literature. After applying the content review approach, a total of 21 papers were selected for 
examination. These papers were analyzed in detail, and their findings were synthesized to develop a comprehensive 
understanding of the advantages of digitalization in project management in the construction industry. 
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To ensure the reliability of the literature review, the researcher used a systematic approach to the search, screening, 
and selection of articles(citation). The inclusion and exclusion criteria were clearly defined and consistently 
applied throughout the study. The content review approach provided a structured and transparent method for 
selecting the most relevant papers for the investigation. 


The primary research themes regarding the effects of digitalization on project management were identified and 
categorised in this study, with the hopes that the findings will help other researchers better understand current 
events, trends, and potential areas for research and innovation in the AEC industry. 


3. RESULT AND DISCUSSION 
3.1 Qualitative Analysis and Discussion 
3.1.1 The state-of-art of Digitalization in project management 


Digitalization in project management is a growing trend that is transforming the Architecture, Engineering, 
Construction, and Facility Management (AEC-FM) sector in many ways. Digital technologies, such as IoT, UAVs, 
3D printing, AR, VR, MR, BIM, AI, and DSS, are increasingly being used to solve problems with cost, rework, 
efficiency, safety, quality, and productivity (Nikmehr et al., 2021). Web-based project systems, digital meetings, 
and BIM have been available for some time, but they are not always fully utilized. However, the right digitalization 
approach can reverse productivity declines, and European Directives for Procurement are pushing for more radical 
digital transformations with support for R&D and training (Prebani¢ and Vukomanović, 2021). 


Project managers are responsible for determining how to use ICT tools to involve project stakeholders effectively, 
and there is extensive research on the digitalization of construction project management practices (Prebani¢ and 
Vukomanović, 2021). Digital technology has been used in the construction industry in the past, but its scope was 
limited due to technological constraints. However, advancements in processing power have given construction 
organizations the opportunity to combine their skills and enhance their processes using digital technology 
(Lundberg, Nylén and Sandberg, 2022). 


Digital twins, which are virtual replicas of physical things that faithfully reproduce all of their properties, including 
how they behave under actual use-case scenarios, can aid in the integration of various information technologies 
into a single digital platform and twin for construction projects (Ryzhakova et al., 2022). Reality Capture (RC), a 
valuable tool for construction project management, can increase project efficiency, correctness, value, and safety 
by incorporating building geometry, build typology, and material amounts in 3D models (Fobiri, Musonda and 
Muleya, 2022). Similarly, laser scanning, which is safe and non-invasive and efficiently and precisely organizes 
space, is gaining popularity in the fields of design, engineering, and construction, all of which contribute to the 
achievement of sustainable development goals (Fobiri, Musonda and Muleya, 2022). Digital technologies are 
having an increasingly negative impact on the AEC-FM sector in two ways: the monitoring of sensor network data 
and the easy management of automation systems (Hosamo et al., 2022). 


Both industrialized and developing countries have adopted sustainable construction techniques, such as green roofs 
and structures, modular construction, information modeling, and smart building technology (SBT), to improve 
their processes (Ejidike and Mewomo, 2023). 


3.1.2 The impacts of Digitalization in project management 


As technology has advanced, building experts have increasingly used it to improve energy management, 
environmental protection, economic efficiency, and human comfort (Ejidike and Mewomo, 2023). By encouraging 
sustainable construction methods that limit waste output, maximize resource consumption, and minimize 
environmental effect, smart building technology adoption benefits professionals, clients, and the nation as a whole 
in developing countries (Ejidike and Mewomo, 2023). For the purpose of enhancing and facilitating a specific 
activity, new digital technologies like computer modeling, digitalization, and creative business processes are being 
developed (Brooks, Zantinge and Elghaish, 2022). 


Digitalization has the ability to improve project management and delivery through the use of construction or 
document software solutions, which includes contractor, document management, process management, and 
activity monitoring and oversight (Qais K. Jahanger et al., 2021). A construction company must have a strong 
commitment to change in order to achieve real digital transformation, as opposed to running analogue and digital 
systems in parallel. It also needs to develop and implement a defined digitalization transition strategy (Nikmehr et 
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al., 2021). BIM and other digital technologies have been utilized successfully in construction projects all around 
the world, showcasing their capacity to manage multidisciplinary teams, spot conflicts, and minimize rework in 
significant projects (Nikmehr et al., 2021). 


The development of 3D models made possible by the application of RC technologies in construction management 
provides clients, consultants, and contractors with a singular learning chance to interact with and define faults, 
structural analyses, constructability challenges, risks, and costs in real-time, in a secure, hazard-free environment 
(Fobiri, Musonda and Muleya, 2022). Although Big Data technology can assist with analysis and processing of 
the large and complex datasets created by construction projects, it is still difficult to integrate data from various 
automated systems to create a digital twin (Ryzhakova et al., 2022). The Digital Twin can interact with other 
simulators and programs by incorporating data and information from throughout an asset's existence, making it a 
crucial decision-making tool for the duration of the asset (Hosamo et al., 2022). 


In order to meet market demand and improve their value offer, the UK's changing construction agenda is 
encouraging construction companies to adopt new business models that make use of digital technology and 
manufacturing processes to develop and deliver whole life value (Cidik and Boyd, 2022). The construction 
industry's methods for generating and capturing value change as a result of digitalization, which causes a transition 
from project-based logic to production-based logic (Cidik and Boyd, 2022). The construction industry's 
interdisciplinary, dispersed, and temporary project organizations, as well as the interdependencies between 
stakeholders, add complexity and make it difficult to meet project requirements for cost, time, and productivity. 
This calls for improved integration, cooperation, interaction, and coordination (Prebani¢ and Vukomanović, 2021). 


3.1.3 How Digitalization impacts on Project Management 


The push for digitization in construction highlights the need for change in the way construction is planned and 
carried out (Brooks, Zantinge and Elghaish, 2022). Digitalization in construction practices uses digital 
technologies to create new organizational practices (Lundberg, Nylén and Sandberg, 2022). A digital 
representation of the real world can be made using Reality Capture (RC) technology, making it easier to plan, 
oversee, and evaluate construction, engineering, and architectural projects (Fobiri, Musonda and Muleya, 2022). 
The use of digital models of buildings and planned project activities can support tasks at different levels, such as 
transaction conclusion, securing investment support, and thorough assessments of capital construction objects 
(Ekaterina Tereshko, 2021). Building assets must be managed, and facility management (FM) calls for handling a 
lot of data, which is currently kept in paper documents that are prone to theft (Siccardi and Villa, 2023). 


Through the seamless integration of information using information-based systems like Building Information 
Modeling (BIM), AEC-FM operations can be improved, projects can be more efficiently completed, and their 
efficacy over their whole lifespan can be increased (Hosamo et al., 2022). Throughout the entire building 
construction process, risk prevention and safety planning have advanced significantly thanks to the use of BIM 
methods and digital twins, which enable accurate data collection and 3D visualization (Torrecilla-Garcia, J., Pardo- 
Ferreira, M., Rubio-Romero,J., 2021). A technology was created to build and analyse models of control processes 
as well as create strategies and tactics to apply the findings to real-world demands (Ereshko et al., 2022). Digital 
construction-phase information management (DCIM) systems can be an effective solution for reversing the fall in 
productivity in the construction industry, however certain governmental agencies have not yet completely adopted 
this technology (Qais K. Jahanger et al., 2021). An information system with an intuitive design and standardized 
methods for data extraction, transformation, and loading can guarantee the long-term archiving, updating, and 
synchronization of metadata and data from different information systems while also safeguarding it from 
unauthorized access by other project participants (Ryzhakova et al., 2022). 


Convenience, cost savings, a wider range of options, greater information, improved sustainability, higher 
communication quality, increased customer satisfaction, and success of economic models and investments are just 
a few of the advantages that digitalization brings to industries (Nikmehr et al., 2021). The construction industry's 
interdisciplinary, dispersed, and temporary project organizations, as well as the interdependencies between 
stakeholders, add complexity and make it difficult to meet project requirements for cost, time, and productivity. 
This calls for improved integration, cooperation, interaction, and coordination (Prebani¢ and Vukomanović, 2021). 
Because of the intricate nature of projects and the involvement of several contracting parties with potentially 
competing interests, stakeholder management, which includes stakeholder analysis and engagement, is essential 
in the construction industry (Prebani¢ and Vukomanović, 2021). 


To ensure the adoption of smart building technology in underdeveloped countries, it is essential to fully 
comprehend its benefits, make the concept more familiar to building experts, and promote its successful use in 
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those areas (Ejidike and Mewomo, 2023). The current emphasis on digitization in the agenda for transforming 
construction assumes that integrated digital technologies can identify coordination issues and enable better inter- 
organizational cooperation, but this perspective poorly articulates how value is formed and ignores the practical 
difficulties of digitization (Cidik and Boyd, 2022). 


4. RECOMMENDATIONS AND DIRECTIONS FOR FUTURE RESEARCH 


Digital technologies have disrupted many industries, and project management is no exception. Digitalization has 
enabled project managers to leverage new tools and techniques to improve project outcomes and collaboration, 
but it has also brought new challenges related to data privacy, security, and governance. To address these challenges 
and leverage the benefits of digitalization, there is a need for further research in the impacts of digitalization in 
project management. This article provides recommendations and directions for future research in this field. 


The first recommendation is to investigate the adoption rate of digital technologies in project management and 
identify factors influencing its adoption. This research can provide insights into the current state-of-art of 
digitalization in project management and help project managers understand the factors that drive or hinder its 
adoption. For example, research could explore the impact of organizational culture, leadership support, and digital 
competencies on adopting digital technologies in project management. 


The second recommendation is to examine the role of digital technologies in enabling remote project management 
and explore the impact on team collaboration and communication. With the rise of remote work, project managers 
need to adapt to new ways of working and leverage digital technologies to collaborate effectively with team 
members who are geographically dispersed. Research could explore the impact of digital tools such as video 
conferencing, instant messaging, and collaboration platforms on team communication, collaboration, and 
performance. 


The third recommendation is to analyze the impact of digitalization on project outcomes, including project 
completion time, cost, quality, and scope. Digital technologies can potentially improve project outcomes by 
enabling more efficient and effective project planning, execution, monitoring, and control. However, research is 
needed to understand the specific impact of digital technologies on different project outcomes and to identify best 
practices for leveraging digital technologies to improve project outcomes. 


The fourth recommendation is to investigate the impact of digital technologies on project risk management, 
including identifying, assessing, and mitigating risks. Digital technologies can enable more effective risk 
management by providing project managers with real-time data and insights into project risks. Research could 
explore the impact of digital tools such as risk management software, predictive analytics, and machine learning 
on project risk management and identify best practices for leveraging these tools to mitigate project risks. 


The fifth recommendation is to explore the impact of digitalization on project management processes, including 
project planning, execution, monitoring, and control. Digital technologies can enable more efficient and effective 
project management processes by automating tasks, providing real-time data, and enabling more effective 
collaboration. Research could explore the impact of digital tools such as project management software, project 
management dashboards, and agile methodologies on project management processes and identify best practices 
for leveraging these tools to improve project management processes. 


5. CONCLUSION 


In this systematic review, the aim was to investigate the impacts of digitalization on project management in the 
construction industry. The study revealed both the challenges faced during the implementation of digitalization, 
such as high costs and sustainability issues, and the positive outcomes resulting from the adoption of digital 
technologies. Through a comprehensive analysis of 21 papers, the study identified key research themes 
highlighting the transformative potential of digital tools like smart building technology, digital twins, reality 
capture, and laser scanning. These technologies have proven to enhance project management by increasing 
efficiency, accuracy, value, and safety. 


The findings emphasize the need for the construction industry to fully embrace digitalization. Integrating Building 
Information Modelling (BIM) and other digital tools facilitates effective information sharing, enabling 
collaborative and multidisciplinary workflows throughout the construction lifecycle. Additionally, Construction 
Automation and Robotics (CAR) are emerging as essential components in sustainable construction practices. To 
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successfully leverage digitalization, organizations must adopt a defined digitalization transition strategy and 
genuinely commit to a digital transformation, considering the organization's structure, nature of work, and human 
resource characteristics. 


Looking ahead, further research is recommended to explore the adoption rate of digital technologies in project 
management and the factors influencing their successful implementation. Understanding the role of digital 
technologies in enabling remote project management and their impact on team collaboration and communication 
is crucial in adapting to evolving work practices. Moreover, in-depth analysis of the impact of digitalization on 
project outcomes, risk management, and project management processes will provide valuable insights and best 
practices for the industry. Ultimately, embracing digitalization presents not only a necessity but also an opportunity 
for the construction industry to thrive in an increasingly digital world, delivering successful projects that meet the 
demands of a growing population and sustainability requirements. 
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EVALUATING THE COMPREHENSION OF CONSTRUCTION 
SCHEDULES OF AN ARTIFICIAL INTELLIGENCE 


Tulio Sulbaran, Ph.D. 
The University of Texas at San Antonio, Texas, United States. 


ABSTRACT: Construction schedules are an important tool to communicate with the project stakeholders and are 
critical for the project management team to plan, coordinate, and manage construction projects. Each construction 
project has a unique schedule that is created based on the construction drawings, specifications, contracting 
requirements, construction methods, and the judgment of the project management team. Therefore, each 
construction schedule is unique in many aspects such as the number of activities, the names of the activities, the 
duration of those activities, and the relationship between the activities. The names of the activities are of particular 
interest as they are the critical core unit to creating the schedule. Furthermore, the activities are the ones that 
bring together all other aspects of the schedule. Unfortunately, there is no standard naming conversion for those 
activities and they vary from project to project as well as from project management team to project management 
team. This inconsistency of the activity name makes it extremely challenging for both humans and machines to 
understand the meaning and scope of the activities. Thus, the problem that this paper addresses is the challenge 
faced by machines to comprehend the activities of a construction schedule. Therefore, the objective of this paper 
is to evaluate the ability of an Artificial Intelligence (AI) implementation to comprehend activities in a construction 
schedule. This research was conducted following a mixed research method. The AI implementation training was 
done by providing the Construction Specifications Institute (CSI) Master Format activity list to a Sentence 
Transformer. Then the AI was given the task of interpreting the activities of a construction schedule according to 
the 50 Divisions of the CSI Master Format. A group of senior construction students was also given the same 
interpretation task. The evaluation was done by comparing the results of the AI vs the humans for each of the 
activities in the construction schedule. The result was that the AI has 0.56 accuracy, 0.50 precision, 0.85 recall 
and, 0.64 F1 Score. This result is very promising and it supports further research to refine the AI to increase its 
ability to comprehend construction schedule activities. Upon achieving a higher level of comprehension an AI 
could be used to assist humans in the preparation of construction schedules or perhaps prepare drafts of the 
construction schedules for the human to review. 


KEYWORDS: Construction Scheduling, Decision Support, Artificial Intelligence, Comprehension 


1. BACKGROUND 


Construction scheduling is a complex process with a lot of considerations for successful project delivery with 
different and specific approaches for each scheduling constraint (Okonkwo et al., 2022). Preparing good schedules 
is a time-consuming process that requires a deep understanding of the construction process (Sulbaran, & Ahmed, 
2017). Construction schedules serve many purposes ranging from informing owners on state of progress, 
establishing long-term coordination among crews and trade contractors, to specifying terms of payment (Halpin 
& Senior, Bolivar, 2017). The construction schedule is one of the most important planning and control tools for 
the construction process (Roston et al., 2020), frequently includes a very large number of activities (Essam et al., 
2023) and it is the core of the project plan. It is used by the project management team to commit resources to the 
project and show the organization how the work will be performed (Magalhaes-Mendes, 2011). The main goal of 
a construction schedule is to identify the activities needed to complete a project and sequence them in the most 
efficient way possible within the timeframe and resources available (Essam et al., 2023). Construction scheduling 
is a complex process due to the interdependence and contradiction of project activities (Essam et al., 2023). 
Construction schedule practices rely heavily on manually elaborated descriptions of construction means and 
methods (Amer & Golparvar-Fard, 2019). The preparation of a construction schedule including the number of 
activities, the names of the activities, the duration of those activities, and the relationship between the activities 
which heavily relies on the judgment and expertise of the project management team. 


The names of the construction activities are the only unstructured data attribute in the construction schedules (Hong 
et al., 2021). Construction activities are described using Natural Language expressions with little or no 
standardization, grammatical errors, abbreviations, project and construction-specific terms (Heigermoser et al., 
2019). Construction activities have been widely discussed in the construction literature (Amer & Golparvar-Fard, 
2019) as they are critical in construction schedules. The activity names are devised to communicate between 
stakeholders, however, they are often written using inconsistent terminologies with omitted contextual information 
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(Hong et al., 2021). The inconsistency and omissions are due in part because construction schedules are prepared 
by project management teams’ using their tacit knowledge. The tacit knowledge is the common knowledge on the 
process of conformance checking that is applied by domain experts (Yurchyshyna & Zarli, 2009). This 
inconsistency in the activity names is further aggravated by the variety of construction means and methods to 
perform construction activities and the differences in practice between different construction companies (Amer & 
Golparvar-Fard, 2019). It is also the case that in most instances, historic information including scheduling decision 
reasoning is not documented and disseminated for use in other future projects (Hong et al., 2022). Although 
construction companies might establish procedures to propagate their construction scheduling knowledge between 
different projects and teams (Amer & Golparvar-Fard, 2019), it is ultimately the project management team that 
prepares the construction schedule. This current scheduling practice leads to activities written in an inconsistent 
format with inconsistent terminologies (Hong et al., 2021) which makes it extremely challenging for both humans 
and machines to understand the meaning and scope of the activities. 


The problem addressed by this paper is the challenge faced by machines to comprehend the activities of a 
construction schedule. Thus, the objective of this paper is to evaluate the ability of an Artificial Intelligence (AD 
implementation to comprehend activities of a construction schedule. The AI’s ability to comprehend construction 
activities is critical to further advance the AI competence to assist project management team in the preparation of 
construction schedules or perhaps prepare drafts of the construction schedules for them to review and fine tune. 


Artificial intelligence (AI) is poised to rapidly transform businesses particularly the construction industry. 
Although, AI is still a new technology in the construction industry, it has the potential to have a major impact 
particularly in construction schedules. AI powered scheduling tools could help the project management teams 
create more accurate and efficient schedules, which could lead to significant cost savings and time savings. 
Optimized schedules are expected to yield significant cost savings over the actual schedules employed (Kettunen 
& Kwak, 2018). 


Artificial Intelligence has many branches and sub-branches as shown in Figure 1. Artificial Intelligence is the 
capability of a device to perform functions that are normally associated with human intelligence, such as reasoning 
and optimization through experience (Grewal, 2014). Artificial intelligence brings into being machines that 
respond to stimulation consistent with traditional responses from humans, given the human capacity for 
contemplation, judgment and intention (Grewal, 2014). 


A subset of Artificial Intelligence (AI) is Machine Learning (ML) in which intelligence is provided to a system so 
that it can act automatically make decisions depending on the past experiences (Tiwari, 2022). Machine learning 
focuses on the development of algorithms that can learn from data without being explicitly programmed. ML 
algorithms are typically trained on large datasets of labeled data, and they can then be used to make predictions or 
decisions on new data. One of the types of machine learning is unsupervised learning in which the algorithm is 
not given any labeled data. Instead, the algorithm is given unlabeled data and it must find patterns in the data on 
its own. Unsupervised learning algorithms try to infer a function to find hidden relations between data points 


(Tiwari, 2022). 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


2. RESEARCH METHODOLOGY 


A mixed research method was used in this research. The mixed research method draws largely on quantitative and 
qualitative research (Leedy et al., 2019). Despite its advantages in comparison to mono methods, mixed methods 
research had been underutilized in the management sciences (Molina-Azorin & Cameron, 2010). However today, 
mixed methods research is increasingly being used in many disciplines (Bentahar & Cameron, 2015). The use of 
mixed research method has increased so much that a specialized journal is devoted specifically to mixed methods 
research - The Journal of Mixed Methods Research, published by Sage (Bentahar & Cameron, 2015). The mixed 
method was used in this research because both non-numerical and numerical data were needed to evaluate the 
ability of an AI implementation to comprehend activities of construction schedules. The implementation of the 
mixed research method was done in four stages: data collection, AI training and preparation, activity interpretation, 
and analysis as shown in Figure 2. 
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Figure 2: Research Stages 


- Data Collection: data of a construction schedule as well as data regarding the Construction Specifications 
Institute (CSI) Master Format (MF) 50 divisions activity codes and descriptions were gathered. 


- Artificial Intelligence Training and Preparation: the CSI MF 50 divisions activity codes and descriptions were 
used to train the machine using sentence transformer with a BERT encoder. During the stage also the activities 
from the construction schedule were extracted and used to automatically create the question of a survey to be 
deployed on-line through Qualtrics. 


- Activity Interpretation: the activities of the schedule were provided to a group of humans and an AI. They both 
were asked to interpret the activities in accordance to the CSI MF 50 divisions. The humans completed the tasks 
through the online-survey in Qualtrics while the AI completed using the cosine similarity metric. 


- Analysis: confusion matrix was used to evaluate the AI comprehension of activities in the construction schedule 
including four metrics — accuracy, precision, recall, and F1 scores to provide a complete picture of the AI 
performance. 


3. RESULTS 
3.1 Data Collection 


The construction schedule gathered for this research project was composed of 94 activities from notice to proceed 
to final completion. The project was a 6,500 SF, single story, steel frame, metal stud, gypsum partitions with 
loadbearing brick and block. The project was a court house in a city in the United States with an approximate cost 
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of $300/SF and a total estimated cost of approximately 2 million dollars. 


The Construction Specifications Institute (CSI) Master Format (MF) 50 divisions activity codes and descriptions 
gathered for this project were composed 7533 individual activities grouped in 35 divisions currently activity from 
Division 00 — Procurement and Contracting Requirements to Division 48 — Electrical Power Generation. 


3.2 Artificial Intelligence Training and Preparation 


The training of Artificial Intelligence (AI) was done in Jupyter Notebook which is a free, open-source, interactive 
web tool known as a computational notebook (Perkel, 2018). Jupyter Notebook was used because it has emerged 
as a de facto standard for data scientists (Perkel, 2018). The programming code in Jupyter Notebook was done 
using Python taking advantage of the Sentence Transformers framework to compute semantic similarity and 
develop the embedding model (Devika et al., 2021). An embedding model is a type of machine learning model 
that is used to represent words or other discrete entities as real-valued vectors. The real-valued vectors were created 
using the Bidirectional Encoder Representation Transformers (BERT) Natural Language Inference (NLI) which 
maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for tasks like clustering or 
semantic search (Devika et al., 2021). The BERT-NLI models was provided the list of the Construction 
Specifications Institute (CSI) Master Format (MF) 50 divisions activity codes and returned the corresponding 
vector for each of the 50 divisions. 


The preparation of the survey was done by uploading the construction schedule into a Jupyter Notebook. The 
Jupyter Notebook extracted the 94 activities from the construction schedule and automatically prepared the 94 
questions using the template shown in the Table la. Additionally, the questions were grouped into four quartiles 
according to the cosine of similarity values between the activity and the CSI MF 50 Divisions per the AI 
Interpretation of the Activities as shown in Table 1b. The four group of questions were uploaded into Qualtrics. In 
Qualtrics, randomized sub-set of questions to be shown to each participant from each quartile were entered as 
shown Table 2. 


Table 1: Template, Quartiles, and Number of Questions 


a. Questions Template b. Quartile and Number of Questions 
Activity Description: <Activity from Construction Schedule 
Here> 
Quartiles Cos Similarity Number of 
Select from the pulldown below the CSI Master Format Values Activities 
Division for which the activity description above (in bold) Top 25% More than 0.87524 
belongs to. Second 25% 0.875 to 0.831 23 
If the activity does NOT belong to any of the CSI Master Third 25% 0.830 to 0.779 23 
Format select "None of the Above”. Bottom 25 Less than 0.779 24 


3.3 Schedule Activity Interpretation 


The first part of the construction schedule activity interpretation was done by humans. To ensure that the 
participating humans could answer the questions within 15 minutes, only the randomized sub-set of questions were 
provided to each participating human. The sub-sets were composed of 18 of the 94 questions. In the 18 questions, 
there were three questions from the top two quartiles and six questions from the bottom two quartiles as shown in 
Table 2. This decision of having more questions from the bottom two quartiles was done because it was 
anticipated that there was going to be a lower percentage of AI construction activity interpretation that were going 
to match the interpretation from the participants. Additionally, none of the questions were mandatory, so the 
participants could skip some of the questions resulting in a total of 316 answers from the participants. The second 
part of the construction schedule activity interpretation was done by AI using the BERT-NLI model. The AI was 
given the same construction schedule activities with the same questions given to the human. 
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Table 2: Questions Template and Number 


Quartiles Questions Per participant Total Questions Answered 
Top 25% 3 51 

Second 25% 3 53 

Third 25% 6 105 

Bottom 25% 6 107 

Total 18 316 


3.4 Artificial Intelligence Schedule Activity Analysis 


The task of interpreting the activities of a construction schedule according to the 50 Divisions of the CSI Master 
Format was completed first by eighteen human participants. The participants’ demographic was as follows: 77.8% 
Hispanics, 61.1% between 20 and 24 years old, 77.8% males, and 55.6% with 1 to 5 years work experience. The 
responses of the participants were grouped in the same four quartiles of questions as shown in Table 1 then the 
answers of the AI were also grouped in according to the four quartiles. If the answer of the AI matched the answer 
of the humans, the answer was considered a match if not it was considered a no match. The AI identification of 
the activities match the human answer on average 50% for the first the three quartiles which correspond to the 
quartiles that the AI was expected to match the human answer. Likewise, the AI identified activities did not match 
the human answer in the bottom quartile 75% of the times as expected. 


Table 3. Questions Template and Number 


Quartiles Number of Number and % of Number and % of Number and % Number and % 
Activities Match No match of Match of No match 

Top 25% 24 14 (58.3%) 10 (41.7%) 

Second 25% 23 10 (43.5%) 13 (56.5%) 35 (50.0%) 35 (50.0%) 

Third 25% 23 11 (47.8%) 12 (52.2%) 

Bottom 25% 24 6 (25.0%) 18 (75.0%) 6 (25.0%) 18 (75.0%) 


Furthermore, the Artificial Intelligence (AI) comprehension of the scheduling activities was also done using a 
confusion matrix. A confusion matrix represents the prediction summary in matrix form (Tiwari, 2022). It is a tool 
to determine the performance of the AI useful to identify areas where the AI may need improvement. The confusion 
matrix is useful because shows how many predictions are correct (true) and incorrect (false) per class (Tiwari, 
2022). The two classes used in this research were that the AI was either expected to identify (top three quartile) or 
no identify (bottom quartile) the activities in the construction schedule. 


The values used in the confusion matrix for the AI Activity interpretation correspond to the first top three quartiles 
for identified and the bottom quartile for the not identified. Also, for the actual activity identified corresponds to 
the match while the not identified correspond to the no match. As shown in Figure 3, the confusion matrix has two 
rows and two columns with four possible outcomes (true positive, false negative, false positive, and true negative). 
The top left quadrant shows the number of true positives, which are cases where the AI implementation correctly 
identified the activity. The bottom left quadrant shows the number of false negatives (also known as type II error), 
which are cases where the AI implementation was not expected to identify the activity but was in fact able to 
identify the activity. The top right quadrant shows the number of false positives (also known as type I error), which 
are cases where the AI implementation was expected to identify the activity, but provided the wrong activity 
interpretation. The bottom right quadrant shows the number of true negatives, which are cases where the AI 
implementation was not expected to identify the activity and in fact was not able to identify the activity. 
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Figure 3: AI Interpretation of Construction Activities Confusion Matrix 


The confusion matrix information was used to calculate four metrics — accuracy, precision, recall, and F1 scores 
to provide a complete picture of the AI performance in comprehending the activities in the construction schedule. 


- Accuracy: is used to find the portion of correctly interpreted activities. In other words, measures how often the 
Al is correct. The value ranges from 1 for 100% accurate to 0 for 0% accurate. The equation used to calculate 
accuracy is presented in Equation | resulting in the AI having an accuracy to identify activities of 0.56. 


B 35+18 _ 
~~ 35+18+35+6 


TP+TN 


a 0.56 
TP+TN+FP+FN 


Accuracy = Equation 1 


- Precision: is used to calculate the Al's ability to interpret positive values correctly (True). In other words, 
measures how often the AI correctly identified the activity when it was expected to do so. Precision is equal to 
the ratio of the number of construction activities correctly interpreted to the total number of construction activities 
predicted. The value ranges from 1 for 100% precise to 0 for 0% precision. The equation used to calculate 
precision is presented in Equation 2 resulting in the AI having a precision to identify activities of 0.50. This result 
is consistent with literature as fully AI automated approach is still immature to be used in the industry where the 
best model scored 0.511 precision (Amer et al., 2021) 


ae TP 35 
Precision = = = 


Equation 2 
TP+FP 35+35 


0.50 


- Recall: (also called sensitivity) is used to calculate the AI's ability to interpret activities among all the activities. 
In other words, measures how often do the AI correctly identified the activities weather or not is expected to do 
so. Recall is the ratio of the number of construction activities correctly interpreted to the total number of 
construction activities interpreted. The value ranges from 1 for 100% recall to 0 for 0% recall. The equation used 
to calculate precision is presented in Equation 3 resulting in the AI having a recall to identify activities of 0.85. 


TP 35 


Equation 3 
TP+FN 35+6 


= 0.85 


Recall = 


-F']-Score: is the harmonic mean of Recall and Precision. In other words, it is useful when a balance between 
Precision and Recall needs to be taken into account. 

2* Precision*Recall _ 2*0.50*0.85 
~ 0.50+0.85 


= 0.63 Equation 4 


F1 Score = 


Precision+Recall 


4. SUMMARY 


The construction schedule activity names are of particular interest as they are the critical core unit to create the 
schedule. Unfortunately, there is no standard naming conversion for those activities and they vary from project to 
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project as well as from project management team to project management team. This inconsistency of the activity 
name makes it extremely challenging for both humans and machines to understand the meaning and scope of the 
activities. Therefore, the objective of this paper was to evaluate the ability of an Artificial Intelligence (AI) 
implementation to comprehend activities in a construction schedule. Following a mixed method in this research, 
the AI was implemented using the Bidirectional Encoder Representation Transformers (BERT) Natural Language 
Inference (NLI) with the list of the Construction Specifications Institute (CSI) Master Format (MF) 50 divisions 
activity codes. The result was that the AI has 0.56 accuracy, 0.50 precision, 0.85 recall and, 0.64 F1 Score. 


5. FUTURE WORK 


Despite the AI not being 100% accurate, this paper opens a wide variety of future research opportunities grounded 
on the mixed method used in this research with the four stages (Data Collection, AI Training and Preparation, 
Activity Interpretation, and Analysis). Some of those future research opportunities include: 1- Used other method 
to evaluate the NLP such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) 
curve, 2- Evaluating other NLP Encoder, 3- Comparing Performance among multiple NLP Encoders, 4- Implement 
a mixture of unsupervised and supervise NLP, and 5- Expand the number and type of schedule activities to be 
implemented just to mention a few. Upon achieving a higher level of comprehension future research could be 
directed towards using AI to assist humans in the preparation of construction schedules or perhaps prepare drafts 
of the construction schedules for the human to review. 
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MULTI-ROBOT FEDERATED EDGE LEARNING FRAMEWORK FOR 
EFFICIENT COORDINATION AND INFORMATION MANAGEMENT 
IN SMART CONSTRUCTION 


Xingi Liu, Jihua Wang, Ruopan Huang & Wei Pan 
The University of Hong Kong, Hong Kong SAR 


ABSTRACT: Smart construction involves a growing array of devices that generate extensive data, capable of 
enhancing construction efficiency and productivity. Nonetheless, the handling of this diverse and abundant 
information, along with the geographical spread of construction sites, poses challenges to effective communication 
and information processing within the management system. Multi-robot systems, as a new type of Internet of Things 
device, have the potential ability to coordinate workers to complete their work while serving as an edge node for 
information storage and processing. This paper presents a multi-robot federated edge learning framework that 
facilitates construction information management and communication. The work demonstrates the role of 
distributed databases in processing information during project execution, in contrast to centralized information 
systems. To address the intricacies of construction sites and the wide array of equipment involved, unmanned 
aerial vehicles and quadruped robots are employed as edge nodes. The formation of a federated edge learning 
framework ensures the real-time processing of massive data and data privacy issues. The Federated Multi-Robot 
(FedMR) framework is a global sharing model focused on preserving differential privacy protection. This 
framework is distributed to multiple edge robots in each round, enabling local real-time processing of robot tasks. 
The system can accomplish target detection and tracking of workers based on computer vision. Additionally, we 
collect MiC energy consumption data during the construction process and predict carbon emissions. Based on the 
implementation and testing of the system, it has been shown to provide structured and reliable information, fast 
local transmission, and the ability to process information in real-time. The system's ability to coordinate workers 
and process information makes it a valuable tool in smart construction. 


KEYWORDS: Construction management, federated learning, multi-robots, Information management, differential 
privacy, Modular Integrated Construction(MiC). 


1. INTRODUCTION 


The increasing development of the construction industry towards being smart and the concern about management 
informatics stimulate a higher requirement for adopting construction technology. Internet of Things (IoT), 
blockchain smart contracts, and AI in construction are emerging as the next wave in smart construction, with 
examples such as detecting the presence of objects in the construction environment to improve safety (Fang et al., 
2018). The management of complex projects will be efficient, automated, and intelligent with computer vision 
technology (Xu et al., 2021). Prefabrication involves the use of different components from different manufacturers. 
This makes it difficult to standardize data sharing across the industry, leading to a lack of consistency in the data 
that is shared. And Prefabricated buildings, spearheaded by modular integrated construction (MiC) as a future 
trend in the construction industry, can lead to poor data sharing and communication resulting in the uniqueness of 
their supply chain (Wuni et al., 2022). The construction process is different from traditional construction methods. 
Prefabricated buildings are constructed off-site in a factory-controlled environment, where the materials and labor 
are streamlined and optimized for efficiency. This requires a unique supply chain that is focused on the 
procurement and delivery of materials to the factory, as well as the transportation and installation of the finished 
product to the construction site. Construction usually requires the cooperation of many stakeholders, which can be 
divided into four categories according to their functions: client, manufacturer, logistics company, and contractor. 
The process of modular construction involves various aspects of design, engineering, manufacturing, logistics, 
installation, and project management. Each stakeholder brings a unique set of skills and expertise to the project, 
including architects, engineers, contractors, manufacturers, transportation specialists, and project managers. And 
these multidisciplinary stakeholders have different expectations, interests, and motivations, and the plethora of 
participants in a construction project leads to low information transparency, inefficient transactions, and even 
frauds. (Luo et al., 2019) Today’s construction is operating in highly dynamic environments, which requires the 
information facilities to be able to provide stable network services and adequate computing sources in interaction 
with the environment on site. For example, verifying the statuses of the modules can monitor the construction 
progress . At the same time, the high level of privacy and the large amount of information generated during the 
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project management process leads to a decrease in efficiency. In order to effectively tackle the challenges posed 
by inadequate infrastructure and privacy concerns during the construction process, it is imperative to implement a 
robust on-site system architecture that facilitates mobile crowdsensing, shared storage, and processing of 
information from computational sources while ensuring data privacy. This technology is critical for the 
advancement of Construction 4.0, and can only be accomplished with ample computing resources and a reliable 
network. By bringing information technology to this traditional yet modern field, we can revolutionize the 
construction industry. The system can improve information transparency, reduce information asymmetry and 
facilitate collaborative work throughout the construction project lifecycle. Managers can keep track of the project's 
construction progress in real-time, identify problems and solve them in time to avoid causing construction 
problems such as deliveries delay, the absence of workers, machinery breakdowns, etc. Effective information 
sharing can also coordinate tasks between managers and workers, improve communication efficiency and optimize 
the construction process (Jiang et al., 2021). 


The limitations of information technology, such as data transmission, make communication and data visualization 
in construction less efficient (Niu et al., 2015). This has a significant impact on the design of low-energy buildings, 
which require real-time monitoring of the environment and control. Niu et al. proposed A virtual reality integrated 
design approach to improving occupancy information integrity for closing the building energy performance gap 
(Niu et al., 2016). But this approach also generates a lot of private data due to the fact that the attitude of companies 
such as manufacturers towards new technologies depends on the environmental and organizational context (Pan 
& Pan, 2019). Building organizations can have concerns about the use of technologies that contain similar issues. 
In turn, this data cannot be fully utilized in a shared manner. A framework that can break through the efficiency of 
data transfer and address data privacy is necessary for the construction process. 
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Fig. 1. An example of construction information management by federated learning 


Fig. 1 Illustrates the multi-robot horizontal federated edge learning framework for construction information 
Management. The framework enables robots to gather information from workers and vehicles and transmit it in 
real-time through sensors to the robots. The robots then sample the information and process their query on the 
device. For instance, if one frame includes a vehicle, the robot will detect and send back the data to the user while 
another section of the sampled frame will be forwarded to the edge server for object detection, specifically the 
detection of the vehicle. Throughout the video analysis query processing lifecycle, the edge device and edge server 
collaborate to provide the user with detection results. By utilizing this framework, the system can effectively 
manage construction information by collecting and analyzing data from various sources, ultimately improving 
project efficiency, and reducing errors. 
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The increasing development of smart construction and the concern about the application of the IoT in construction 
stimulate a higher requirement for information exchange and communication quality. There are privacy issues in 
the transmission, storage, and analysis of data. These data relate to confidential industry information, such as the 
operations and costs of the companies involved. Institutional restrictions hinder information sharing between 
clients and contractors. Strict regulations undermine the efficiency of information sharing between clients and 
contractors. As factories, shipping companies, and contractors are independent. They cannot share data between 
them, which makes it difficult for project management as a third-party company to organize and coordinate the 
entire construction process. Site coordination requires the exchange of information between different contractors 
and sharing raw data between different contractors may compromise privacy and lead to problems such as 
communication barriers. The lack of information sharing between construction companies and suppliers is a 
significant barrier (Ojo et al., 2014) . Smart Construction is a method of construction that requires coordination in 
construction duration. A highly coordinated construction program of plant, transport, and construction is required. 
However, if clients and contractors do not accept relational matters as a long-term strategy, they will refuse to 
share implicit information (Yan, 2014). 


Insufficient computing power equipment on site due to the complex environment of the construction site and the 
lack of well-established power and communication system. The raw data collected on the edge devices need to be 
uploaded to a cloud server for processing, which requires a lot of data transfer and processing time. For the 
construction of security systems, functions such as path planning and target recognition require real-time 
computing capabilities. While the majority of devices currently possess at least one image sensor and the ability 
to record and play high-resolution videos, they are deficient in the processing power required to execute intricate 
real-time computer vision algorithms. They are still unable to perform high-intensity performance computer vision 
tasks in real-time, with high frame rates and low latency (Honegger et al., 2014). 


Star topology is widely used when the intelligence of the network is concentrated on the central node. However, 
the star topology has many disadvantages (Bisht & Singh, 2015). The star topology has many redundant links to 
ensure high connectivity, which results in high installation and maintenance costs and poor resource-sharing 
capabilities. Also, the communication lines are only used by the central and edge nodes on the line, which lead to 
communication lines being poorly utilized. Nevertheless, the central node demands frequent attention, and if it 
malfunctions, the entire network will come to a standstill. As computing evolves from centralized mainframe 
systems to many powerful microcomputers and workstations, the use of the traditional star topology will be 
reduced. 


The purpose of this study is to introduce a multi-robot federated edge learning framework that can facilitate 
communication and information management in smart construction. The paper highlights the challenges posed by 
the large amount and wide variation of information involved in construction and the spatial dispersion of 
construction sites to the information management system (Akinosho et al., 2020). The paper propose the use of 
unmanned aerial vehicles and quadruped robots as edge nodes to process information during project execution. 
The framework is a worldwide sharing model founded on the principles of differential privacy protection. It gets 
distributed to numerous edge robots during each round, enabling local real-time processing of robot tasks. The 
system can accomplish target detection and tracking of workers based on computer vision, and predict carbon 
emissions. The authors demonstrate that the system can provide structured and reliable information, fast local 
transmission, and the ability to process information in real-time, making it a valuable tool in smart construction. 


This section below sets the scene by giving an overview of the existing initiatives and status concerning 
construction information systems. The rest of the paper is structured as follows: Section 2 reviews the literature of 
federated learning, federated learning in construction, and distributed ledger technology. Section 3 discusses the 
research methodology and object selection. Section 4 presents the system architectures and federated edge learning. 
Section 5 discusses the applications to the construction industry and performance evaluation of the federated edge 
learning framework. Finally, Section 6 draws the conclusions and suggests future research. 


2. LITERATURE REVIEW 
2.1 Federated learning 


Federated learning (FL) is a distributed machine learning scheme based on parallel computing that can overcome 
the challenges of data sensibility and data silos through the collaborative and decentralized neural network 
structure. FL has a high correlation with distributed learning, while it focuses on providing a collaborative model 
without privacy leakage (Li et al., 2020) At present, the FL has two mainstream open-source frameworks. Google 
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(Google, 2019) proposed a TensorFlow Federated (TFF) framework for meeting the demands of deep learning 
services in decentralized data. Webank (“Webank (2019a),” n.d.) presents the first industrial-level framework, 
Federated AI Technology Enabler (FATE) serves for cross-organizational architecture. Furthermore, Ramaswamy 
et al. (Ramaswamy et al., 2019) proposed the prediction of emoji in mobile keyboards, which is a successful 
application for model improvement and Secure Aggregation for the concern of stakeholder privacy. And Yang et 
al. (Yang et al., 2019) divide FL frameworks into three categories: vertical FL, horizontal FL, and federated transfer 
learning. In the case of vertical Fl, data is partitioned in the vertical direction by the feature dimension. Horizontal 
FL is suitable for cases in which data are multiple in sample space and a set number of overlaps among the feature 
of data storage in various nodes. Upon most occasions, data shares are quite distinct from sample space and feature 
space. Therefore, federated transfer learning can solve the problem in this setting is poor data quality without data 
labels (Li et al., 2020). 


2.2 Construction information management 


Having access to data at the right time for construction managers can assist project construction in assessing the 
construction performance of the corporation and subcontractors (Carrillo et al., 2013). When implementing 
construction information management, accurate data recording and comprehensive data analysis help to establish 
the credibility of stakeholders (Yang et al., 2019). Construction is a complex, one-off manufacturing process that 
involves many businesses, including design, manufacture, and transport. To assess the risks linked to each 
stakeholder, unprocessed data is necessary. Forecasting stems from data mining, a result of the methodical 
management of construction data. In addition, based on the data simulations, the manager can predict the potential 
problem area and plan for them (Arayici et al., 2012). With the completion of the first phase, the requirement for 
additional equipment and material in the second phase is identified. The analysis also provides construction 
stakeholders with visual information on project progress (Doloi, 2013). With the better access to and updating of 
the management system, the raw data should not be recorded with simple storage. It also provides data analysis 
servers. The potential for innovative algorithms based on artificial intelligence to support efficient construction 
processes (Pan et al., 2022). Data analytics and IoT are useful for the analysis of construction impact on parameters. 
In construction, construction information management is as important as construction, as information affects all 
construction-related activities (Kim et al., 2013). 


3. METHODOLOGY 
3.1 Problem definition 


The primary focus of this present research is on information processing, analysis, and collation in construction 
scenarios. To achieve the study aim, construction robots have been designed as computational nodes, each 
corresponding to a construction safety monitoring device. These monitoring devices include cameras such as RGB, 
RGB-D, and LIDAR. Construction robots can be integrated as computational nodes, and can assist in the 
construction process while also collecting data and training models locally. This local processing ensures that data 
safety functions can be achieved while analyzing data on-site. In the context of construction safety monitoring, 
construction robots equipped with safety monitoring devices can help ensure safety compliance in construction 
sites. Additionally, they can help identify potential safety hazards, such as structural instability or unsafe working 
conditions. Furthermore, the collected data can be processed and analyzed locally, without the need to transfer 
sensitive information to external servers, ensuring data safety and privacy. By doing so, data can be analyzed in 
real-time, enabling quick decisions and actions to be taken to prevent accidents or hazards. 


By utilizing the FedMR approach, data sharing is minimized, and data privacy is protected while effectively 
predicting the facial fatigue status of workers in real-time. This approach can aid in improving construction safety 
by enabling prompt interventions, when necessary, ultimately reducing accidents and improving the overall safety 
of workers and equipment vehicles in the construction scene. 


3.2 Construction information framework overview 


Construction is a complex process that generates a vast amount of information, as depicted in Fig. 2. The 
construction information is created from the design phase at the start of the project, and all project-related data is 
stored in a distributed manner on the construction site. This includes the management of approvals and material 
information, installation and transportation processes, and the information can be collaboratively managed. 
Through federated learning, all resources can be integrated securely, and machine learning can be used to 
efficiently integrate resources and make plans. Construction projects are complex endeavors that involve a 
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multitude of stakeholders, including architects, engineers, contractors, and subcontractors. 
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Fig. 2. Construction information on the different phase 


Coordinating the information flow among these entities can be challenging, especially given the extensive data 
that requires sharing and processing. Federated learning presents a promising remedy, providing a secure and 
efficient approach for construction information management. It enhances efficiency and accuracy without 
compromising data privacy and security, as multiple parties collaboratively train a model without sharing raw data. 
When well-implemented, federated learning aids stakeholders in early problem identification and informed 
decision-making, ensuring project success. 
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Fig. 3. FedMR system framework overview 


The Federated Multi-Robot (FedMR) framework consists of an edge controller, a series of multi-robot systems, 
and servers, as illustrated in Fig. 3. The data flow involves two routes: the data path and the control path. A user's 
query traverses through the robots and servers in the workflow, which consists of a series of functions. In the data 
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path, construction takes place sequentially from the first function to the last one in the workflow. Following data 
processing through this workflow, the resulting outputs are then delivered back to the user. 


By employing the FedMR framework, construction information can be managed efficiently and securely, enabling 
prompt interventions when necessary and improving the overall safety of workers and equipment vehicles on the 
construction site. 


4. APPLICABILITY OF THE FEDMR SYSTEM 


To validate the viability of the approach, experiments were conducted to detect targets in images using a quadruped 
robot that captured local video data. The system carried out target recognition at the robot's side when the edge 
server retrieved images of vehicles. The training dataset for detecting moving objects in construction sites (MOCS) 
was preloaded into the robot. 


In essence, the system's target detection capabilities were put to the test using real-world scenarios. By setting up 
video data collected by the quadruped robot, the system was able to accurately identify and classify targets within 
the images. The edge server, which received images of vehicles, promptly recognized the target objects, thanks to 
the training dataset preloaded in the robot. The MOCS dataset proved invaluable in training the system to detect 
moving objects in construction sites. By integrating the dataset into the robot, the system was able to recognize 
and classify objects in real-time, even in complex construction environments. This approach demonstrates the 
effectiveness of preloading training datasets into robots, enabling them to perform target recognition efficiently 
and accurately. 


4.1 Environment setting for image test 


The experiment, depicted in Fig. 4, employs YOLOv8 on Pytorch 1.8.1. The NVIDIA RTX3090 GPU serves as 
the platform for training and testing the model. Due to the rough road surfaces present in the construction site, a 
quadrupedal robot was selected as the ideal load-carrying system, as a typical wheeled chassis would have 
struggled to navigate the build environment with ease.The industrial camera used in the experiment is DF100- 
1080P (JIERUIWEITONG), while Unitree Gol and DJI M200 function as edge devices. During the training phase, 
the new model incorporates parts of the pre-trained model from YOLOv8x. Since YOLOv8 and YOLOv8x share 
most of the backbone (block 0*8) and some of the head, it is possible to transfer a wide range of weights from 
YOLOv8x. Leveraging these weights during training can save significant time and computational resources. 
Overall, the use of YOLOv8 on Pytorch 1.8.1, combined with the powerful NVIDIA RTX3090 GPU, enables the 
model to train and test efficiently. Additionally, the use of industrial cameras, as well as the Unitree Gol and DJI 
M200 edge devices, further enhances the experiment's reliability and accuracy. By incorporating pre-trained 
models and weights, the experiment provides a practical approach to target detection that can be easily adapted to 
a wide range of scenarios. 


Fig. 4. Quadruped robot-based edge devices 
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4.2 Object detection in federated edge learning 


borer 07 


Fig. 6. Validated datasets with labels. 


As illustrated in Fig. 5 and Fig. 6, the model trained on the local data training set achieved impressive results in 
target recognition. All targets were recognized compared to the labelled images. However, some of the worker 
targets had smaller IoU values when they were smaller in size. On the other hand, large vehicles were well 
recognized if they were presented as a whole. The best results were obtained for excavators. 


When the edge server makes a query for a relevant target, the query image can be transferred back to the 
corresponding server. This enables the data to be trained and queried under local conditions, facilitating the 
management of information for construction while maintaining data security. By utilizing this method, 
administrators can access the information they need for security, management, etc., without having to access data 
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from other departments. 
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Fig. 7. The result of Fl-scores 


As shown in Fig. 7, the F1(H-mean) score is a combined evaluation indicator of recall and accuracy. The F1 value 
represents the division of the arithmetic mean by the geometric mean, with a higher value indicating better 
performance. Analyzing the Precision and Recall formulas in the context of this reveals that when the F1 value is 
low, there's a relative increase in True Positives and a decrease in False Positives, leading to a relative increase in 
both Precision and Recall. Essentially, F1 is a weighted measure that considers both Precision and Recall. 

2 2*P*R 


1,1 P+R 
PTR 


F1 = 


The P is the precision of the YOLOv8 based on FedMR , and R is the recall of algorithm. 


The worker and crane identification had the best F1 score. The confidence of all categories was 0.72 at F1=0.357, 
which is a significant improvement compared to the YOLOv8 standard of 0.65. This indicates that confidence is 
also guaranteed to be very good at higher recall. This data proves that edge nodes can provide good data processing 
capabilities, and the local model can process the data collected in the field in real-time. With a confidence level of 
0.35 at 0.70 in comparison to YOLOV8 utilizing a non-FedMR framework, it is evident that the FedMR framework 
significantly enhances information processing capabilities. This improvement leads to better performance, even 
when data processing is restricted to a localized environment. 


The FedMR have adapted the original FederatedAveraging (FedAvg) algorithm to our framework as shown in 
Algorithm 1. This aim is to investigate the impact of various data divisions and federated learning settings. To 
achieve this, the framework have modified the FedAvg algorithm to a FedMR algorithm by replacing the server- 
client communication framework, such as SocketIO, with a method that saves and restores checkpoints on hard- 
devices. This simplifies the model aggregation process. However, our implementation can also be effortlessly 
transferred to FedAvg. 


Algorithm | FedMR 


Input: N client parties{C,,};,—1 y,total rounds T , and Server side S; 
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Output: Aggregated Model w 


S initializes federated model parameters, and saves as checkpoint. Client parties {C,},-1.y, load the 
checkpoints. 


for t = 1, ..„T do 
for k= 1, ..,T do 
wp = w 
each client {Cp} do local training: 
fori = 0,1,...,M@do 
(Mk is the number of data batches b in the client Cp) 
client {Ck} computes gradients VP(w , bi) 
update with wg = wy — nVE(Wx, bi) 
end for 
save W, results to checkpoints. 
end for 
S loads checkpoints and get averaged model with w = =YN_, wy 
end for 


Return w 


5. CONCLUSIONS 


FedMR has demonstrated its versatility as a comprehensive model for handling information, optimizing processes, 
and monitoring isolated data. This empowers robots to offer insights, predictions, and warnings tailored for 
individual distributed construction workers prior to task execution at the work package level. However, existing 
machine learning approaches for modeling, optimization, and monitoring in construction information management 
necessitate data sharing or aggregation from each company, posing privacy risks and failing to deliver personalized 
monitoring of construction site conditions. 


Therefore, this paper has proposed a horizontal federal learning framework, FedMR, to aggregate cryptographic 
information data parameters from different building stakeholders without compromising privacy and personalize 
the model differently based on the jobs each robot is responsible for. The model is improved based on the Fedvision 
model, utilizing the multi-robot as the edge device. The testing process is experimentally validated through the 
retrieval of image information by managers, using a target detection approach by training and testing locally. After 
experimental verification, the efficiency of YOLO algorithm can be improved in FedMR framework can be 
improved by 1.5% under the premise of user privacy F1 can be improved. The evaluation results show that the 
proposed FedMR can achieve mainstream YOLO recognition performance, and privacy is well protected. However, 
there is still room for improvement in terms of model performance and scalability. Future research should focus 
on exploring the potential of incorporating more advanced machine learning techniques such as the Large language 
model. Additionally, as the use of robots in construction becomes more prevalent, it is important to continue to 
prioritize the development of secure and personalized monitoring solutions to ensure the safety and efficiency of 
construction processes. 


6. ACKNOWLEDGE 


The work described in this paper was supported by the Research Impact Fund of the Hong Kong Research Grants 
Council (Project No.: HKU R7027-18) and the 43rd Round University Research Committee PDF/RAP Scheme of 
The University of Hong Kong. 


561 


REFERENCES 


Akinosho, T. D., Oyedele, L. O., Bilal, M., Ajayi, A. O., Delgado, M. D., Akinade, O. O., & Ahmed, A. A. (2020). 
Deep learning in the construction industry: A review of present status and future innovations. Journal of Building 
Engineering, 32(101827). https://doi.org/10.1016/j.jobe.2020.101827 


Arayici, Y., Egbu, C., & Coates, P. (2012). Building Information Modelling (Bim) Implementation and Remote 
Construction Projects: Issues, Challenges, and Critiques. Electronic Journal of Information Technology in 
Construction, 17, 75—92. 


Bisht, N., & Singh, S. (2015). Analytical study of different network topologies. International Research Journal of 
Engineering and Technology (IRJET), 2(01), 88—90. 


Carrillo, P., Ruikar, K., & Fuller, P. (2013). When Will We Learn? Improving Lessons Learned Practice in 
Construction. International Journal of Project Management, 31(4), 567-578. 
https://doi.org/10.1016/j.ijproman.2012.10.005 


Doloi, H. (2013). Cost Overruns, and Failure in Project Management: Understanding the Roles of Key 
Stakeholders in Construction Projects. Journal of Construction Engineering and Management, 139, 267-279. 
https://doi.org/10.1061/(ASCE)CO. 1943-7862.0000621 


Fang, W., Ding, L., Zhong, B., & Love, P. E. D. (2018). Automated detection of workers and heavy equipment on 
construction sites: A convolutional neural network approach. Advanced Engineering Informatics, 37, 139-149. 
https://doi.org/10.1016/j.aei.2018.05.003 


Google. (2019). Tensorflow federated. https://www.tensorflow.org/federated 


Honegger, D., Oleynikova, H., & Pollefeys, M. (2014). Real-time and low latency embedded computer vision 
hardware based on a combination of FPGA and mobile CPU. In 2014 IEEE/RSJ International Conference on 
Intelligent Robots and Systems, 4930—4935. https://doi.org/10.1109/IROS.2014.6943263 


Jiang, Y., Liu, X., Kang, K., Wang, Z., Zhong, R. Y., & Huang, G. Q. (2021). Blockchain-enabled cyber-physical 
smart modular integrated construction. Computers in Industry, 133, 103533. 
https://doi.org/10.1016/j.compind.2021.103553 


Kim, C., Son, H., & Kim, C. (2013). Automated Construction Progress Measurement Using a 4D Building 
Information Model and 3D Data. Automation in Construction, 31, 75-82. 
https://doi.org/10.1016/j.autcon.2012.11.041 


Li, L., Fan, Y., Tse, M., & Lin, K. (2020). A review of applications in federated learning. Computers & Industrial 
Engineering, 149, 106854. https://doi.org/10.1016/j.cie.2020.106854 


Luo, L., Shen, Q. G., Xu, G., Liu, Y., & Wang, Y. (2019). Stakeholder-associated supply chain risks and their 
interactions in a prefabricated building project in Hong Kong. Journal of Management in Engineering, 35(2), 
05018015. https://doi.org/10.1061/(ASCE)ME. 1943-5479.0000675 


Niu, S., Pan, W., & Zhao, Y. (2015). A BIM-GIS integrated web-based visualization system for low energy building 
design. Procedia Engineering, 121, 2184-2192. https://doi.org/10.1016/j.proeng.2015.09.091 


Niu, S., Pan, W., & Zhao, Y. (2016). A virtual reality integrated design approach to improving occupancy 
information integrity for closing the building energy performance gap. Sustainable Cities and Society, 27, 275— 
286. https://doi.org/10.1016/j.scs.2016.03.010 


Ojo, E., Charles, M., & Esther, T. A. (2014). Barriers in implementing green supply chain management in 
construction industry. Proceedings of the 2014 International Conference on Industrial Engineering and Operations 
Management, Bali, Indonesia. https://doi.org/10.18485/epmj.2020.10.2.5 


Pan, M., & Pan, W. (2019). Determinants of adoption of robotics in precast concrete production for buildings. 
Journal of Management in Engineering, 35(5), 05019007. https://doi.org/10.1061/(ASCE)ME.1943- 
5479.0000706 


Pan, M., Yang, Y., Zheng, Z. J., & Pan, W. (2022). Artificial Intelligence and Robotics for Prefabricated and 


562 


Modular Construction: A Systematic Literature Review. Journal of Construction Engineering and Management, 
148(9). https://doi.org/10.1061/(ASCE)CO.1943-7862.0002324 


Ramaswamy, S., Mathews, R., Rao, K., & Beaufays, F. (2019). Federated learning for emoji prediction in a mobile 
keyboard. http://arxiv.org/abs/1906.04329 


Webank (2019a). (n.d.). Federated AI Technology Enabler. (FATE). https://github.com/weban kfintech/fate 
Accessed 2019 


Wuni, I. Y., Shen, G. Q. P., & Mahmud, A. T. (2022). Critical risk factors in the application of modular integrated 
construction: A systematic review. International Journal of Construction Management, 22, 1-15. 
https://doi.org/10.1080/15623599.2019.1613212 


Xu, S., Wang, J., Shou, W., Ngo, T., Sadick, A. M., & Wang, X. (2021). Computer vision techniques in 
construction: A critical review. Archives of Computational Methods in Engineering, 28(5), 3383-3397. 
https://doi.org/10.22260/ISARC2019/0090 


Yan, N. (2014). Quantitative effects of drivers and barriers on networking strategies in public construction projects. 
International Journal of Project Management, 32(2), 286-297. https://do1.org/10.1016/j.14jproman.2013.04.003 


Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM 
Transactions on Intelligent Systems and Technology, 10(2), 1-19. https://doi.org/10.1145/3298981 


563 


ROBOTIC ASSEMBLY AND REUSE OF MODULAR ELEMENTS IN 
THE SUPPLY CHAIN OF A LEARNING FACTORY FOR 
CONSTRUCTION AND IN THE CONTEXT OF CIRCULAR ECONOMY 


Jochen Teizer, Kepeng Hong, Asger D. Larsen & Marcus B. Nilsen 
Department of Civil and Mechanical Engineering, Technical University of Denmark, Denmark 


ABSTRACT: Although robotic solutions have been making significant contributions to fabrication environments, 
implementations in the construction are rare. It seems a disconnect between the industries exists where in 
construction the high number of non-uniform work tasks, the wide assortment of types and shapes of building 
materials and elements, and the presence of human workers creating safety hazards make the deployment of rather 
rigid robotic manipulators on construction sites much more complex than in production-like work environments. 
To advance construction with robotic solutions, it could prove beneficial to make each sector aware of the barriers 
that exist, and likewise, introduce a physical space for joint experimentation with state-of-the-art technologies 
from both fields. One way of alleviating this issue is to connect the sectors by providing hands-on education and 
research experiences, defined hereby as Learning Factory for Construction (LFC). This paper presents a scaled- 
down version of a LFC that has a robotic manipulator perform fully-automated and precise assembly, 
deconstruction, and reuse tasks of modular construction elements, whereas the elements are tracked with fiducial 
markers according to a known building information model and schedule. Furthermore, the FLC continuously 
gathers and analyzes data for performance, measures successful completions, assembly times, and potential 
quality defects. This project involved Masters level students with domain expertise from architectural, civil, and 
mechanical engineering in a cross-disciplinary and collaborative learning exercise of building a working 
prototype within a semester-long study project. Beyond the core tasks of the digital design and robotic application, 
the group developed theoretical concepts and limitations for more holistic views on circular economy, lean 
production, on- and off-site logistics, modularization, and construction safety, just as expected from a LFC. It is 
anticipated that the next generation of professionals working in the built environment and intending to solve some 
of the larger and more complex societal problems will require both the technical and communication skills that a 
LFC can stimulate. Therefore, LFC is expected to become an important component of active learning environments. 


KEYWORDS: Active learning environment, automation and robotics, building information modeling, circular 
economy, human-machine interaction, learning factory for construction, modular construction, next-generation 
tech-savvy engineers, rapid prototyping and testing, renovation, reuse of materials. 


1. INTRODUCTION 


For the past decades there has been an increased interest in robotic technology in construction applications. 
Economic projections foresee a prospering field and actual widespread usage in practice, requiring new policies 
and rules across the impacted industries (EC, 2022). However, many challenges still present themselves regarding 
robots in construction. Simple tasks that prove easy to execute for humans, prove extremely difficult for robotic 
manipulators due to a lack of perception and cognitive abilities. The size and weight of robots in industrial work 
environments, often tackling singular and highly repetitive tasks, does not fit the challenging, complex, and highly 
dynamic work environment that exists in construction sites. Yet, finding the necessary functionality and usability 
are a few of the additional barriers that exist and prevent robots from mainstream implementation. Despite some 
recent and rather serious interest from the industry, robotic applications in construction have stayed limited to 
niche research or exploration projects. Automated and robotic brick laying machines (Usmanov et al., 2017; Ravi 
et al. 2021) and additive manufacturing are some examples (Teizer et al., 2016). 


To enable the use of robotics, suitable methods to assist the robots are necessary to consider. Yet, they are difficult 
to develop as construction touches a multi-disciplinary field that makes it challenging to find acceptable solutions. 
A few somewhat isolated disciplines (and stakeholders) are: design (architects/planners), construction (civil 
engineers), machinery (mechanical engineers), and systems and processes (industrial engineers). While innovation 
in any field, like in construction, calls for lifting the boundaries between these domains, a further major aspect to 
consider before introducing robotic applications in construction is to maintain a high level of trust, productivity, 
and safety in new technologies (EC, 2022). 


Change to fabrication environments came over decades, with fully-automated solutions replacing isolated and 
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highly repetitive work tasks humans would not endure. The typical construction work environments yet may 
demand a similar time frame and even more. For example, active human-robot collaborations are supposed to 
solve the sector’s rather complex and interconnected work tasks. The involvement of multi-trades’ expertise and 
the manifold types of product or material specifications constitute a few of the other but plentiful technical 
challenges that semi- or fully-automated robotic solutions are envisioned to solve before decision-makers would 
buy into them for final field use (Slaughter, E.S., 1998; Goodrum & Haas, 2016) 


Yet, the effects that a transition to robotic labor would have on construction can include improvements to 
construction industry-wide problems, including but not limited to achieving higher productivity and better safety 
and health performances. As such, prioritization of human time and purpose of life and health, and ease of system 
installation and maintenance, to name only two criteria, reflect the current construction industry’s efforts towards 
digitalization, automation and robotization (Yamamoto, 2020). 


The concept of a Learning Factory (LF) is not new, and yet they hardly exist for construction purposes. Teizer and 
Chronopoulos (2022) expressed that a Learning Factory for Construction (LFC) can provide a useful active 
collaborative working environment for engineers that are interested in exploring prototypical solutions that have 
the potential to solve known industry problems. In their articulated vision, a LFC provides the explorative 
collaboration space to (a) detect the organizational barriers that prevent innovation, (b) allow objective and scope 
definitions by understanding the technical limitations in existing work processes, and (c) create prototypical hard- 
and software solutions that can be tested on small but at realistic scale and with little risk of losing large investments. 
Gaining knowledge in a LFC first is required to later adapt solutions to a larger workspace and with increased 
autonomy. And yet, students that participate in a LFC should have fun, like Teizer et al. (2020) and Wolf et al. 
(2022) found out when observing construction apprentices that played serious games for construction safety. 


As the widespread application of robotics in manufacturing industries has significantly improved productivity and 
efficiency, there has been significant research interest in construction robotics. Besides reducing project delivery 
delay, construction robotics can benefit the workers by assisting them with non-ergonomic tasks (e.g., lifting 
weights) and taking over dangerous activities (e.g., demolition). However, the implementation of construction 
robotics heavily relies on manual input from task to task due to the complicated nature of construction activities. 
For instance, the difference between as-built and as-designed models during the construction stage can be 
challenging for preprogrammed construction robots to understand the changing environment at the construction 
site. Only in combination with a higher level of digitalization construction robotics can it be effectively 
implemented for automated or semi-automated construction. Emergent methods and technologies, such as BIM 
and vision-based object recognition, collaborate with construction robotics to complete the workflow of automated 
construction. Such collaboration requires various fields of engineering to understand all involved technology and 
the interplays between these technologies. Figure 1 integrates many of the currently existing digital technologies 
and how they relate to each other. Highlighted in grey background color are those that are part of this LFC. 
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Fig. 1: Overview of relation between digital technologies; modified, originally from EC (2021), 


The goal of this paper is to demonstrate the viability of the integration of a robotic manipulator into a LFC, 
displaying the advantages of using modular components in the context of autonomous construction in a circular 
economy. The following sections first review the background, then introduce the developed LFC, a scaled-down 
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version of a building construction site, and finally demonstrate its capabilities in a case study where a robotic 
manipulator handles modular elements for building assembly and reuse under some of the typical real constraints 
that exist in the construction supply chain and in a circular economy. 


2. BACKGROUND 


Several existing cases have shown that assembly processes using robotic manipulators are favorable. Wang et al. 
(2020a) stated that robotic construction was both faster and more accurate compared to conventional manual 
methods by construction workers alone. However, it was stated that robotic solutions were limited to being either 
conducted in non-complex work environments or limited to handling specified objects. In other words, robotic 
solutions still required some aspects of human labor to complete complex activities. These limitations pose some 
of the biggest challenges to overcome. 


The recently-completed research project HEPHAESTUS proved successful in installing curtain wall modules 
using a large-scale cable-driven robot alongside a robotic manipulator, but many improvements were left to be 
implemented (Iturralde et al., 2020). 


Using robotic manipulators for construction has shown to be easier executed when introducing the modular and 
parametric design in assemblies of complex constructions. Research on modular design for robotic construction 
showed that it is possible, using modular components, to verify the design and construction process through 
simulations (Sun et al., 2022). 


Using timber panels, which are identified by computer vision and machine-readable QR codes, has proved to make 
it possible for a robot arm to do insertions of panels to create simple assemblies (Rogeau et al., 2020). In addition, 
a robot arm using standardized timber was able to construct complicated structures with high precision. Results 
from Leng et al. (2020) showed the benefits and possibilities of utilizing standardized materials, with precise 
parameters being a key factor. A similar research with timber addressed the issues of wood being a natural and 
imprecise material which complicates handling, highlighting the issues of production tolerances (Hasan et al., 
2019). 


The notion of having robots build from a digital model has been investigated in several papers, with a focus on 
exporting Building Information Modeling (BIM) to a robot from an as-designed model or importing the physical 
as-built model for guidance purposes. Likewise, using a Digital Twin for Construction (Sacks et al., 2020), the 
process of having a robot build from a BIM and updating the as-built model using sensor data was validated (Wang 
et al., 2020b). At the moment, software packages are being developed to help link BIM-based design with robot 
control which could ease the process of future digital-to-physical model building (Yang et al., 2019). Slebicka et 
al. (2021) placed an important vision for Fabrication Information Modeling (FIM) that intends to close the gap 
between BIM and Digital Fabrication that, at some point in time, will heavily depend on automation and robotics. 


While only a small amount of the above-mentioned research addresses the interaction between autonomous robots 
and human workers, human-robot interaction proves to be detrimental when considering on-site safety (Wu et al. 
2020). With the prospect of robots in construction, it is recommended to also investigate their social impact since 
the potential changes to workplaces will require workers to acquire additional skills, competencies and 
responsibilities (Karl et al., 2018). A proposed method of tackling safety is to introduce the concept of an LF, which 
emphasizes hands-on experience. LFs offer a high potential to improve education, training, and research in a 
controlled environment (Abele et al., 2017). 


Gharbia et al. (2019) concluded that rapid prototyping assisted in the creation of robotic solutions. In this context, 
introducing a robot manipulator into a LFC would educate on, and increase the awareness of the capabilities of 
autonomous robots in a construction work environment and help involved project stakeholders (engineers and 
workers) adapt increasingly advanced technologies to a construction site. 


Related to this effort, robots of different sorts have already been introduced to various Learning Factories in 
relation to Industry 4.0. Several researchers have included robotic arms in their own specialized Learning Factories, 
albeit with a focus on manufacturing and assembly (Matt et al., 2014; Kaménzy et al., 2018; Nardello et al., 2017). 


As with Industry 4.0, the recent technological advances will gradually replace the roles of humans in construction, 
in what is coined Construction 4.0 (Sawhney et al., 2020). However, the challenge of integrating a robot into a 
LFC environment with the purpose of improving construction processes is yet to be investigated. 
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3. ROBOTIC MANIPULATOR IN A LEARNING FACTORY FOR CONSTRUCTION 


The first part of this section briefly explains the relevant backgrounds of the research methods employed in this 
work of Masters-level students in a semester-long study project that utilizes the LFC at the Technical University 
of Denmark. Next, the hard- and software components of the LFC are introduced. Experiments and results follow 
with a discussion summarizing the lessons learned at the end. 


3.1 Introduction to components 
3.1.1 Learning factory for construction 


LF has proven to be an effective way to provide active hands-on learning (Abele et al., 2017). LF typically 
reproduces or simulates a production environment, allowing participants to gain practical knowledge and skills in 
a controlled setting. In addition, LF also provides a platform for researchers to investigate and improve processes 
and workflows. In this case, our LFC is meant for university students but can also involve apprentices or full-time 
professionals, like workers, technicians, and engineers from the construction industry. The purpose of our LFC is 
to provide the physical space that facilitates education and research on automation and robotics in construction. 


3.1.2 Robotic manipulator 


A robotic manipulator performs tasks as a human arm (Matt et al., 2014). In our case, the robot mimics a mobile 
or tower crane on a construction site. Our LFC consists of multiple modules that include robotic elements, of which 
only the robotic manipulator URSe, its mounted camera and gripper, and a computer will be explained in the 
further text. Details of the other existing components of our LFC can be found in Teizer and Chronopoulos (2022). 
While these eventually will be connected to each other, this paper introduces the part of the LFC that simulates the 
process of three steps in automated modular construction: automated assembly, disassembly, and reuse according 
to a BIM-based building design. The URSe is made of several interconnected segments, has joints, and one end- 
effector, allowing it to make rotary and linear movements. The end-effector performs assigned tasks at any position 
within the spatial coordinates of the robotic arm. The learning factory uses URSe as the robotic arm and mounted 
gripper as the end-effector so that it can grab, lift and place any given components at the assigned positions. It has 
six degrees of freedom (x, y, z, roll, pitch, yaw), whose value can be changed so that the gripper mounted at the 
end of the arm can be moved to desired position and orientation. There exist three types of movements, moveJ (the 
robot moves each joint independently), moveL (the robot moves in a straight line), and moveP (the robot moves 
following the designed path). 


3.1.3 Building information model and construction schedule 


BIM is a comprehensive and collaborative method across the whole building life cycle (Oraee et al., 2017). Yet, it 
has less been used in combination with automation and robotics than other applications. Our LFC uses 
commercially-available BIM software for the manual design of a fictive modular building project and, likewise, 
is the sequence of constructing the modular elements planned digitally. While this may imply a detailed 
construction schedule comprised of the precise timing and dependencies of the construction tasks, only the Work 
Breakdown Structure (WBS) is needed. The BIM software is also used for visualization purposes. Otherwise, the 
IFC format contains geometry and position information for each of the modular elements and the task sequence. 


3.1.4 Building materials 


The building is constructed with standardized physical models using the URSe. The pieces are made from 
lightweight plastic and come in several shapes. 


3.1.5 Object detection 


In order for the robot to handle the modular elements, object detection and recognition with final localization is 
required. Object detection is made possible by computer vision that identifies and localizes the modular elements 
of next interest within the video frame capture. There exist two main approaches for object detection, traditional 
(e.g., rule-based, handcrafted features) and deep learning-based approaches. Out LFC integrates traditional object 
detection algorithms for construction sites and modular component detection. 


3.1.6 |Human-robot interaction 


Human-robot interaction happens only twice in this part of the LFC: first, to place new modular elements in the 
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arrival area to the simulated construction site that is within range of the robot arm, and second, when the building 
owner makes a choice of selecting a building design and floorplan. Otherwise, the developed module of the LFC 
operates fully autonomously, as explained in the following. 


3.2 Methods 


The framework of the LFC is shown in Figure 2. It comprises two parts. The first part is a remote control, where 
students in the role of an architect or civil engineer upload their IFC file to the computer. Note that the modular 
building designs of the students allow some variation but still follow the material specifications and parameters 
that were given to them beforehand. The computer extracts the geometry and position information of each modular 
building component and determines the construction sequence. The spatial and temporal information of 
construction is translated into robot commands and then sent to the robot for controlled execution. The manipulator 
first scans the entire site to locate and map the coordinates of the material pick-up, the temporary depot, and the 
construction zones. When ready, the robot finally receives the building commands to start the construction process. 


Robot commands 
la ere *=— Robot controller 
@, Jj Image. | 


` 
| IFC file ` 
` 


A Architect 


&Engineer 


Fig. 2: Learning Factory for Construction (LFC) at the Technical University of Denmark: Construction site 
module. 


The hardware and software requirements and descriptions for the learning factory are listed in Tables 1 and 2. The 
BIM translation and robotic remote control are implemented in a Python environment due to its simplicity and 
extensive library support. 


Table 1: Hardware in the construction site module of DTU’s LFC. 


Equipment Description 


Robotic manipulator URSe for grabbing, lifting, and placing building components 
Camera on end-effector OnRobot RGBD camera for object detection and as-performed data collection 


Computer Processing BIM files, translating commands, receiving and processing data, control 


Table 2: Libraries and software for the learning factory 


Library and applications Version Description 

URX 2.0.1 URSe remote control and program execution 
IfcOpenShell 1.6.1 IFC file translation and querying 

OpenCV 20.10.22 Visual detection ad recognition 

BlenderIFC IFC file editing and viewing 


For reliable object detection, the camera uses fiducial marks attached to the construction site and building 
components to recognize the different objects, as shown in Figure 3. 
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A 


(a) (c) 
Fig. 3: Examples of (a) a zone and (a) modular building elements, all marked with different fiducial marks for 
object recognition. Note. The elements are further detected by shape and color. 


3.3 Implementation and preliminary results 


The preliminary implementation of this component of DTU’s LFC is shown in Figure 4. The layout and setup 
follow the concept mentioned earlier. For a simplified demonstration, a basic two-story building consisting of 5 
IFC elements is designed, shown in Figure 5. The five components are positioned at distinct heights so that the 
algorithm can easily sort the order of construction. Figure 6 shows the simplified modular building elements to 
which each unique fiducial markers are attached. Existing computational algorithms later detect and recognize the 
fiducial marker when it is within the field-of-view of the mounted camera on the end-effector. 


E © 1FCBUILOING (5) o 
] v Ground (1) e 
p ifcStab (1) e 

> Foundation (1) eo 

p IfcWall (1) e 

v Level 1(2) eo 

p IfcStab (1) o 

» lfcwall (1) o 

v Level 2 (1) e 

p ifcSlab{1) o 

Fig. 4: Experimental setup of LFC. Fig. 5: IFC model of the 2-story modular building project 


(bye (c) 
Fig. 6: Modular components: (a) foundation, (b) wall, and (c) floor. 


Figure 7 illustrates the construction process. After the students load their IFC file, the computer extracts the spatial 
information of building components and determines the construction sequence. The corresponding list of the 
modular elements and the building sequence is shown in Table 3. The robot registers the coordinate of the 
construction sites by using the mounted camera to detect the fiducial markers of the zones. After the coordinate 
system is registered, the robots start to detect, grab, lift, and place the modular construction elements iteratively 
until the last component is assembled onto the building. Reversely, the disassembly process can also be achieved. 
All modular elements are taken apart and placed in a temporary storage zone (called depot). Next, human 
interference is needed if the next phase of the building lifecycle is of interest to the student. The student can choose 
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to select an alternative building design, upload it, and the new building process can start again. Note, whenever 
possible, the robot reuses parts of the modular building elements lying in the depot. Once the second building is 
completed, typically, the LFC experience stops. 


1. Load the IFC file and find 
elements 


2. Sort the assembly order of 
the elements 


3. Register the pick-up, 
depot, and construction zone 


Repurpose the disassembled elements 


5. Grab, pick up, and place 
building elements until all are 
assembled 


6. Disassemble the building 
elements if required 


4. Identify the building 
elements 


Fig. 7: LFC-workflow of the robotic manipulator module: First the design, then the robotic assembly and 
disassembly, optional: re-design and -use. 


Table 3: List of the modular building elements and example of a building sequence by the ascending z coordinate 


Number Elements Coordinates Sequence 
1 'Foundation:297060' (x1,y1,z1) 1 

2 'Floor:301328' (x2,y2,z2) 

3 'Floor:301575' (x3,y3,z3) 5 

4 "Wall:3048 10" (x4,y4,z4) 2 

5 "Wall:305546' (x5,y5,z5) 4 


As the algorithm is sorting the construction sequence by a bottom-up approach, it may only be viable for modular 
building elements with simplistic spatial relations. The construction order can be determined using a predefined 
construction schedule in 4D BIM. For each IfcElement, IfcTask and IfcRelAssignsToControl are attributed so that 
the algorithm can understand the predecessor of each step and validate the correct order during the construction 
stage. 


While, iterative occurred during the system’s development, demonstrating that the entire workflow was tested 5 
times in front of a small audience from industry and academia. Although no strict scientific verification and 
validation methods were ready at the end of the semester project, the students were able to run the system two 
times successfully from start to end. Twice the students assisted by snapping an element (one floor and one wall 
element, in separate tests) into place with a very slight push of an index finger. Once the robot stopped after 
assembly the first design. The reason is still unknown. Yet, runtime data from the system was recorded during all 
test runs and is being processed at this time (and will be implemented in the final version of this paper). 


Fig. 8: Impressions from final demonstration exercises: Robot manipulator completing the fully-automated (a) 
assembly of the first modular building design, (b) disassembly and temporary storage, and (c) re-use of modular 
elements for assembly of second building design (manual selection, after disassemble). 
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4. CONCLUSIONS AND OUTLOOK 


This work is the result of a semester-long study project that exposed four Masters-level students, one in 
architectural, one in civil, and two in mechanical engineering, to backgrounds that they had not learned before. 
For example, both the architectural and civil engineering student had no previous experiences with the field of 
automation and robotics, and likewise had mechanical engineering students neither a background in design or 
planning with 4D BIM nor any expertise in modular construction. The developed concept of a LFC has been 
partially validated, as the robot manipulator was able to follow a digital design and sequence to erect, disassemble, 
and rebuild a small-scale building while applying constraints that exist in a circular economy, for example, making 
as much use as possible of reusing building material. However, as observed, the limited project time that was given 
to the students restricted their curiosity in exploring additional research domains, for example, planning for 
alternatives, generative costing, digital twinning, and testing usability. In the future, a focus on qualitative and 
quantitative assessment methods must be set to evaluate both the students’ and LFC’s performances. Yet, the 
students’ claimed new knowledge by applying their own expertise and discovering other fields. Furthermore, the 
experienced hands-on experiences with respect to realistic and still basic implementations of information modeling, 
computational coding, automation, and robotics, strengthened their learning. It is envisioned that the construction 
industry will benefit from students with such skill sets that a LFC is able to develop, share, or enhance. Yet, scaling 
up the developed concept of a LFC could yield future insights how digital building design can guide real-life 
automation and robotic applications in construction. 
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ABSTRACT: The construction industry is undergoing a significant shift in how design and production are 
conducted. Building Information Modeling (BIM) has emerged as a key tool for coordinating information from all 
involved disciplines and providing a more holistic view of the construction process. However, effective 
coordination and communication between different professions remain major challenges that require new 
approaches to project management. Takt planning has gained increasing attention as a potential solution to 
improve traditional planning methods. Despite this, there is a lack of real-world studies exploring BIM and takt 
planning where information is structured according to takt planning. A takt planning structure for all BIM-models 
would bring a more holistic understanding of what is to be done, controlled, and reported back. To address this 
gap, this paper presents findings from a three-stage research process. Firstly, form a focus group of disciplines to 
find a shared structure to present the execution in a common way for design and construction in a lab environment 
at a conceptual level, secondly implementing it to the detailed design information for real -world case project in 
workshops and group meetings with the focus group and then thirdly, evaluate it in the case project with the site 
staff involved. The findings highlight the importance of a shared denominator to get a holistic approach to project 
management and enabling takt planning throughout all phases of construction, providing insights into its practical 
application and benefits for the construction industry. 


KEYWORDS: WBS; Building information modeling (BIM); Project Management; Takt planning. 


1. INTRODUCTION 


The construction industry is undergoing a significant shift in how design and production are conducted, this also 
affects how projects are documented and handed over once finished. This shift can be seen as a digital 
transformation, with a strong focus on technologies (e.g. Howard et al., 2002; Samuelson & Björk, 2014). A key 
in this transformation is the emergence of Building Information Modeling (BIM) for coordinating information 
from all involved disciplines and providing a more holistic view of the construction process (Azhar, 2011; Sacks 
et al., 2018). BIM can also alleviate information loss that occurs in conventional non-digital workflows (Borrmann 
et al., 2018, Chapter 1; Sacks et al., 2018). Thus, BIM is seen as a major contributor to the digital transformation 
of the industry (Samuelson & Stehn, 2023). While there are some projects moving from drawings towards a model- 
based construction and process (Disney et al., 2021; Gaunt, 2017), there is still a reluctance to fully adopt BIM 
and thus slowing change. 


One factor identified as barrier to change is the fragmentation and high specialization of the construction industry, 
where a disconnect between design and construction phases contributes to the fragmentation (Cerezo-Narvaez et 
al., 2020; Mohd Nawi et al., 2014), and the prevailing project conditions preserves roles, processes, value chains 
and working methods within individual companies and prevents change (Samuelson & Stehn, 2023). Traditionally, 
construction projects mostly follow a waterfall principle where information in each phase is adjusted and modified 
for respective phase (Leicht et al., 2020). Furthermore, the high fragmentation and specialization amongst 
subcontractors is identified as potential factor for projects overshot budgets and schedule overruns occur (Nepal 
& Staub-French, 2016). Work Breakdown Structure (WBS) can help in defining and structure the project (Makarfi 
Ibrahim et al., 2009). Cerezo-Narvaez et al. (2020) stresses that by using a well-developed Work Breakdown 
Structure (WBS), that integrates the Cost Breakdown Structure (CBS), a more representative project schedule and 
budget can be produced, as well as project roles and responsibilities can be assigned to subcontractors more easily. 
Furthermore, Cerezo-Narvaez et al. (2020) also emphases that there is a lack in alignment between WBS and CBS 
and that a more structured work management is essential, especially in the digital management of projects. Thus, 
a standardization of classifications could enable integration of the WBS and CBS and ensure a connected 
information flow. Jung and Kang (2007) notes that standardization of the WBS could contribute to a wide set of 
project control systems, such as scheduling, cost control, materials management amongst other construction 
business functions, this confirms similar conclusions shown in Garcia-Fornieles et al. (2003), which also adds 
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responsibility assignment and information management to the list. Therefore, there is a need to find shared 
information and classification structure to enable a flow of information from design to production and all the way 
to operations and maintenance (O&M). There is a lack of structuring of this in BIM data between different 
disciplines such as planning, scheduling and cost control (Cerezo-Narvaez et al., 2020; Makarfi Ibrahim et al., 
2009). Makarfi Ibrahim et al. (2009), concludes that a standardized WBS structure is missing and proceeds to 
develop and present a structure fitted to the UK construction sector, they also note limited possibilities for 
generalization of this structure worldwide and that WBS structures should be developed contextualized to 
respective market. 


With regards to planning and control, standardized processes have been proven to be beneficial (Haghsheno et al., 
2016), along with a BIM-model, the project can be divided into identifiable repetitions where Takt planning can 
aid in the communication and implementation of the schedule (Viklund Tallgren et al., 2022). With the rise of the 
use of BIM-models, research points towards a possibility to improve information flow between design and 
construction phases as well as improved communication and collaboration within phases, especially during 
planning and scheduling (Crowther & Ajayi, 2019; Nepal & Staub-French, 2016; Viklund Tallgren et al., 2021). 
However, there is a need for a more systematic approach to the coding of models to be able to use them throughout 
design and construction phases. 


Thus, there is a need to support processes spanning over both design and construction, and through to O&M. Both 
internationally and nationally there are numerous examples on standards to address increased digitalization in 
construction, such as CoClass which is supposed to replace the older BSAB in Sweden, Cuneco Classification 
System in Dennark, Uniclass in the UK and the North American Architectural, Engineering and Construction 
industry system OmniClass for example (Cerezo-Narvaez et al., 2020; Eckerberg et al., 2016). CoClass was 
developed to carry information about classes, properties and activities connected to different construction related 
processes. A sub-set of information created in one such process is a model view definition (MVD), governed by 
the information delivery manual (IDM). 


Thus, the aim of this research is to investigate what information is needed for production control and management 
to integrate from the design phase to the construction phase and how this information should be structured to help 
understand the project and its challenges better. This paper proceeds with this question, and addresses the general 
question through the following research questions: 


1. Which stakeholders need to be involved from design and construction phases to find a shared common 
information and coding structure? 

2. What are the challenges with a shared information and coding structure for design and construction 
phases? 

3. How are the developed shared information and coding structures utilized in an actual construction 
project? 


This paper is structured as follows; a review of related works connected to WBS, classifications and planning is 
shortly presented, followed by an account of the results with regards to the research questions. As a result, from 
the focus groups a shared coding structure was identified that extends current classification. The discussion shows 
that the addition of deliverables and construction scope could be used throughout the phases of the project and was 
found to help communication between disciplines and construction phases. The findings highlight that this type of 
structured information enables the prerequisites needed to increase digitalization and integration between 
disciplines, enabling for example a more holistic Takt planning. 


2. METHODS 


This research uses a qualitative approach to explore the three research questions. A combination of methods has 
been used to gather data. A brief overview of these methods is followed by a more in-depth description later in this 
section. The research was instigated through an identification of the critical stakeholders to bridge the design phase 
and the construction phase. These stakeholders formed a focus group. The focus group combined workshops with 
focus group meetings to capture context and initial requirements for the information structure. These initial 
meetings informed the research and development process, and two additional stages were decided to be added. 
The second stage concentrated on implementing and evaluating information that formed the structure. The third 
stage expanded the group and focused on the effects of the proposed structure. The aim of the research was to 
investigate what information was needed for production control during the stages from design to construction and 
how this information should be structured to help understand the project and its challenges better. 
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The focus group meetings were documented with meeting minutes and field notes. Throughout the research one 
single project, Hovas Tak, was used as a case. 


Fig. 1: The Hovas tak apartment project (Hovas tak, Nordr, 2022) 
2.1 Case Study — Hovås Tak — An Apartment building 


The project used in the research was an ongoing construction of an apartment building, located in southern 
Gothenburg, Sweden. The house is an apartment building with two co-joined tower-blocks forming a single body, 
se Fig. 1. Each stairwell has four apartments on each floor with a total of 59 apartments. The building has a total 
gross area of 5170 m? with a framing of precast walls and slabs, with light steel infill walls and a non-load bearing 
brick façade. 


2.2 The three research stages 
2.2.1 Stage 1: Identifying a Coding Structure and Required Information 


There are a lot of people involved in a construction project, but not everyone has the same degree of influence on 
how the information is structured. In the case project, Hovas Tak project, the disciplines that account for the most 
decisive and or governing amount of information for the design and construction management were analyzed, and 
a focus group was formed with these disciplines to try to find a shared structure for the information. 


The focus group of ten people was represented by following disciplines: 


e Client — controls the vision, the purpose of the project and what is to be built. Control the names and 
designations of the different parts of the building as well as documents related to the project. 

Design manager and BIM coordinator — controls the information and information structure from the designers. 
Cost manager — advises over the content of the calculation and how it is structured. 

Scheduler — controls the structure of the schedule, the content is developed together with the site management. 
Site management (Project Manager, Site Manager, Site Engineer) — advises over construction planning, 
logistics, purchasing, site layout plan. (i.e., the overall structure of how the project will be executed, in the 
more detailed planning those involved in the module will be involved) 


Through workshops, the focus group tried to find ways to group and or sort the information in a way that would 
primarily facilitate construction planning and scheduling. The workshops were used to define coding structure and 
the definition of the designations “deliverables” and “construction scope” used in the case project, Hovås Tak. 


2.2.2 Stage 2: Implementing and Evaluating the Coding Structure and Information 


The participants in the focus group were also active in the development of the detailed design documents. This 
took place in parallel with the second stage of the focus group workshops. The workshops also served the purpose 
to create consensus of the boundaries between the deliverables and the construction scope. The designations were 
documented continuously to create a uniform project language. 


2.2.3 Stage 3: Effects of Using the Coding Structure 


The construction documents were completed in Q1 2023, and construction started in early 2023. The focus group 
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then begun to evaluate the effects of the developed shared information structure. The group also studied how the 
structure affected the understanding of the project documents and how the structure was used for the more detailed 
construction scheduling and documentation. An expanded focus group that also involved the main contractor, 
Skanska’s entire site organization in the case project, Hovas Tak, was established, and in a workshop the group 
identified the changes brought about with the newly introduced coding structure. 


3. RESULTS 


The workshops and focus group meetings showed that the waterfall principle was used where the information 
structure is adapted to each discipline on the way downstream. Upstream traceability was secondary as the focus 
were identifying functional requirements and compiling the cost and steering towards the set budget. During this 
process, it was recognized that each discipline’s description of the construction work was optimized for its 
discipline. The information was statically presented in the form of reports, drawings, and 3D models. The site 
management’s planning during this stage was mostly highlighting, extracting relevant amounts of information to 
describe the step-by-step execution of the building via a detailed job planning description and assembly drawings. 


The pre-manufacturing elements were manufactured from documents based on geometries and functional 
requirements that existed as construction documents. A disconnect were thus created between the created 
construction documents and the original design documents. Traceability backwards was secondary, and this made 
lessons learned more difficult to document. In comparison, digitization and model-based construction made the 
information less static, and the information could dynamically be presented/sorted in different ways to describe 
different purposes. 


3.1 Stage one — Identifying Coding Structure and Required Information 


During the workshops, the focus group began by clarifying the vision with the information, creating an aim and 
purpose for the information structure. Here, the client clarified the project vision for the case project, Hovas Tak. 
The client stressed that it should be a “carefully planned and well-thought-out apartment building”. This influenced 
all the project communication and work processes. 


The next workshop challenge was to find a consensus in the project designations. In the absence of clearly 
communicated names of the different parts of the building, prior projects have been shown that the different 
disciplines create their own designations to orient themselves. For the case project, a document was created to 
handle the project designations, and any revisions were logged in a similar way to a building document. The 
marketing team was represented by the client to ensure that a clear consensus was created in the designations 
towards end customers, facilities management, and O&M. 


3.1.1 The disciplines different information structures in the case project 


During the workshop it was identified that the model structure of the designers and the structure for material take- 
off both are linked to building parts and functional quantities. For the case project, Hovas Tak, the main contractor, 
Skanska’s cost calculation structure and process is based on BSAB83 (Swedish classification system from 1983), 
and the BIM-model’s structure is based on BSAB96 (Swedish classification system from 1996). The workshops 
concluded that the BSAB structure worked well for technically describing building components and functional 
quantities. Furthermore, the various contracts that were procured was based on similar groupings of functional 
requirements, so documentation for purchasing was relatively easy to filter out from the BIM. 


@ Apartment building 
© Project schedule 
BTF 1. Milestones 
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5. Construction 
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SJ 6. Completion and inspection 


Fig 2: The schedule and time planning are based on the illustrated information structure. 
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The client's information structure was partly tailored for marketing to end customers and partly for O&M. The 
marketing material used the project designations when describing the final design of the building and the design 
and content of the different apartment types. The marketing language and information structure were adapted to 
attract the identified target customer group. O&M information was structured based on how the management of 
the property was planned. The client had a specific company-adapted structure for the facility management 
information which resembles a general BSAB structure. 


The information structure of the scheduling is based on Skanska’s basic scheduling template for housing projects 
see Fig. 2, which has been developed over the years and enables a rough comparison between projects. The site 
management together with a scheduler bases the schedule on this template structure and then details the phases to 
the specific project in accordance with the site management precious experience from similar projects. 


ş + ve PP 
S Ne F F 
ES + > S 5 aS P 
s rà >. oS + Ra àa O Milestone 1 to 5, total construction 
S s SNE ca- 2” = time for analyzed projects 
$ aon SAS S ve Na pd 
S Sy Fal V a 5 
*@ 2 * Q @* © 
Groundwork + Foundation 
Framework (above ground) 
Roofing + External finishes 
Internal finishes (Weatherseald structure) Hovås Tak 
ovås Tak 


Completion and inspection Š 
SSS 
Fig. 3: When having the same WBS and coding, it is possible to analyze projects KPI’s against each other to find 
outliers. 


For a number of years now, all housing projects at Skanska extract Key Performance Indicators (KPI) for cost and 
time according to crucial milestones — essentially a schedule plan analysis. The general phase schedule which is 
used as input for the analysis is shown in Fig. 3 (left), together with the project completion date KPIs plotted for a 
number of individual projects (right). By using the same general structure in all projects, it is thus possible to 
compare KPI metrics between different apartment building projects and identify outliers or projects that must be 
optimized when it comes to productivity. The case project, Hovas Tak, was analyzed using this specific structure 
to predict and forecast performance. As seen in Fig. 3 (right) the schedule analysis places Hovas Tak at the lower 
end of construction time, and at the mid-range for cost when compared to similar projects (i.e., as illustrated by 
the lower-left rectangle). 


The Construction planning was then based on an execution structure (a step-by-step completion structure), for the 
project. The execution structure was based on Site layout plans for the general construction phases, the master 
schedule, logistics and delivery schedules and purchasing schedule. 


3.1.2 Shared information structure 


The information of each discipline was structured to describe their vital information in an effective way. A common 
denominator was identified as missing by the focus group. The denominator should enable grouping of information 
in a similar way regardless of discipline to coordinate a shared sequential execution The focus group decided to 
refine the basic schedule template and the schedule analysis together with the phases in the construction planning 
to create deliverables that group the information in a similar way for all disciplines. 


A methodology for finding the shared information structure was developed. Through an iterative process where 
the first loop created the first overall names and a first overall division of the deliverables of the construction 
phases and its geographical division were adapted to the project’s conditions and scope. A side effect of this work 
was that as more information was generated in the project, the focus group continually helped coordinate and 
structure the information to create consensus in project designations and construction phases, see Fig. 4. 


CoClass was identified as a possible shared coding structure. However, suitable properties were missing to get a 
shared structure used by all disciplines. 
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Fig 4: Illustration showing how a shared information structure for the construction project could support all the 
disciplines with CoClass coding of Deliverables and Construction scope. 


Construction scope 


In dialogue with Svensk Byggtjänst and the development team of CoClass, a contribution to the classification was 
made in the form of two new properties: 


e Deliverables, “TPPR Produktions etapp” in Swedish, which is defined as a temporal object property 
indicating production phase. 

e Construction scope, named "PRWT Produktionsdel", which can be defined as production property that 
specifies the spatial division of work. 


By adding these two properties, the CoClass structure enables a shared structure to describe the project execution 
as well as the technical description for all disciplines. The case project, Hovås Tak, did not use CoClass to its full 
extents because existing coding structures were too tightly tied to the existing planning and cost management 
systems and their information structures, but the new properties were added to existing systems. 


3.2 Stage two — Implementing and Evaluating Coding Structure and Information 


By using storyboard form like film and sketching out a sequence of events for each construction phase, the focus 
group could agree on a more project-adopted division of the deliverables. The content of each deliverable was 
analyzed to ensure that the deliverables reflected and supported the actual execution. In particular, the focus group 
needed to discuss the transitions between the different deliverables for the information to be delimited in a similar 
way for all disciplines. The structure of the deliverables was then arranged in a hierarchy with a more 
comprehensive structure broken down into more detailed levels for certain deliverables where there was a need to 
describe the execution more clearly, see example in Fig. 5. 


General construction faces ~ Deliverables for Hovås Tak 


P Internal frame completion and finishes |700 Completion of infill walls 


-Interior walls and ceiling 


Surface layers 


Decor and equipment 


Common areas 


Site overheads for internal frame completion and finishes 


Fig. 3: Left part shows the general phases side by side with deliverables. 


When the boundaries of the deliverables began to become clearer, the geographical distribution was analyzed to 
find a suitable material and workflow for each construction scope. This also had the effect that other disciplines 
could provide timely input to each other. 
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The construction scope was created to reflect the material flow to the specific location, the gradual execution of 
tasks within the scope, and provide structured information from construction for use during the O&M. Furthermore, 
digitization and model-based construction and takt-planning made the information less static and enabled filtering 
and sorting in different ways to describe different purposes. 


3.2.1 Challenges 


Both during design and cost management of the project the new coding structure posed challenges. In each phase 
it was found that the supporting systems did not easily allow mirroring of the execution process into deliverables. 
As a solution some composite model objects created from the classifications in the system were decomposed in 
more detailed functional parts to fit with the deliverables. A typical example is found in exterior walls for the case 
project. The design and cost management uses composite object to represent functional object of the exterior wall, 
meanwhile the schedule and the site management represents and complete the exterior wall in four different 
deliverables: 


Construction framework, 
Façade (external finishes), 
Internal frame completion and 
(internal) finishes. 


aii ee 


While changing the systems for cost management and design was not a viable solution, a workaround was needed. 
This created extra administration in each discipline both in working with these WBS codes and filling them out 
and to work and ensure the quality of each discipline's own information. 


3.3 Stage three — Effects of Using the Coding Structure 


The first and foremost effect identified was the reduced language confusion in dialogue between the disciplines 
and how information was consumed between them. The disciplines experienced a reduced need to process and re- 
fit information in later stages of the construction process when the information was coded with deliverables and 
construction scope. By gathering construction results using this shared coding structure, a more holistic 
understanding of the project was formed by each discipline. Information that used to be found in different systems, 
sorted by different coding structures was now found more easily through the shared code structure. Additionally, 
disciplines could use the shared coding and information structure to ensure that they talked about the same 
deliverables and objects. Thus, communication between the design manager, BIM-manager, cost managers, the 
scheduler and site management could flow more easily. 


Following are some distinct examples of how the shared coding structure simplified communication and where 
the combined information created more value than each piece of information on its own. 


3.3.1 Schedule — presented in 3D model. 


By connecting the deliverables of the models and the schedule and its construction scope, a visualization of the 
schedule was created directly using the BIM module in Powerproject. The visualization of the scope and content 
to be scheduled increased the understanding for the disciplines involved, and reconciliation of work completed 
became easier to review. 


3.3.2 Upload quantity takeoff and easier cost control 


Since the quantity takeoff from the model was already coded with deliverables and construction scope, time for 
the cost manager to structure the costing data was shortened. Changes in quantities were easier to identify by the 
focus group and updating the cost control was faster because a smaller amount of information had to be compared 
within a clearly defined area using the construction scope coding. 


Since the cost estimate was able to be sorted based on how the case project, Hovas Tak, was to be executed, 
deliverables were faster reviewed and understood, and each sequence clearly visualized by the model using 
deliverables. The cost management could thus be linked to the degree of completion of each deliverable. 
Performance-based payment plans could also be linked to the work completed in each deliverable. 


3.3.3 Quality work and inspection plan 


The quality controls continued during the execution and ensured that the requirements were met in the finished 
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product. With the help of the deliverables, quality risks for each deliverable could be identified. The monitoring 
of the inspection plan and self-inspections also became more clearly structured. The review of the inspection plan 
by the inspection manager together with the site organization and the various contractors was also greatly 
facilitated. 


3.3.4 Work environment and safety risks 


During the design phase, the principal designer is tasked with identifying work environment risks and as far as 
possible, eliminating them. The identification of work environment risks was facilitated by the simulation of the 
coordination of all discipline’s BIM-models and construction phase schedules, gaining valuable insights into work 
environment risks needed to be considered in each deliverable. 


3.3.5 Basis for takt planning 


The sequential breakdown of the information, simulating the execution of the construction works were identified 
to be fully in line with the structure of the takt planning. One or more deliverables formed the basis from which 
work packages were created. The work packages could then be broken down into takt zones consisting of one or 
more construction scopes. This breakdown formed teams of disciplines that performed the work in each takt. The 
sequence of work in each takt zone formed a takt-train. All BIM-models follow this hierarchy and the addition of 
information and review of status of the BIM-model could be done continuously via each takt-wagon. By 
connecting the takt-planning with the model it was possible to dynamically filter and visualize the takt zones and 
takt-wagons for the different subcontractors. 

i ae a SKANSKA perry 


Fig. 4: By structuring of information in the project and BIM into shared code and information structure that 
supported and reflects how work progresses on-site, it was possible to connect the construction schedule, the 
cost control plan and takt schedule. 


4. DISCUSSION AND ANALYSIS 


The main insight during the focus group meetings were that a shared WBS and coding structure was lacking in the 
execution planning, similar insights has previously been presented but limitations in generalizability has been 
stressed (Cerezo-Narváez et al., 2020; Makarfi Ibrahim et al., 2009). Since the main contracting company recently 
has taken a more of a coordinating responsibility in new projects results show there is a great need to improve 
communication and coordination, similar conclusions are stressed in related studies (e.g., Crowther & Ajayi, 2019; 
Nepal & Staub-French, 2016; Viklund Tallgren et al., 2021). Thus, communication and coordination of 
subcontractors and designers, cost managers, schedulers and other stakeholders is as important for the project as 
it is for the site management. 


The documentation of common language and designations eased communication process throughout the project. 
However, it could be argued that it is not necessary to replace the existing WBS and code structures that are well 
established for each discipline. The key is to bridge organization, process, technology, and information and through 
dialogue find common project denominators that enable the WBS and information to be grouped in shared way by 
different disciplines. This is supported by the conclusions and like the structure presented in Makarfi Ibrahim et 
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al., (2009), but customized and developed for the Swedish context. Deliverables and construction scope groups 
the project in ways that reflect how work progresses on-site and complements the traditional description of the 
construction results emphasized by existing coding structures and existing WBS systems. 


The traditional segmentation of work in contracts, disciplines or functions will still be needed since each area has 
their own needs and demands on how to ensure quality of their information and optimize their work in their systems. 
The traditional division works well enough for optimizing purchases and clarifying responsibilities between 
contracts and disciplines. The added coding of deliverables enables a more pro-active and fine-grained analysis 
than previously was possible. Furthermore, a common WBS and information structure enables new possibilities 
to the development of schedule analysis and the KPI’s on the master schedule critical path. This can give a deeper 
insight and understanding of how different projects progress and find outliers or inefficient projects. But this data 
could also be used in the future for machine learning and artificial intelligence during the bidding phase or planning 
of new construction projects. 


Another outcome from the study was that the implementation by the focus group also improved the overall work 
processes, where the team came together and a closer bond was formed between design, cost management and site 
management, lowering the threshold for communication amongst them, in line with precious research (Crowther 
& Ajayi, 2019; Nepal & Staub-French, 2016; Viklund Tallgren et al., 2021). One of the main insights during the 
implementation of the developed WBS and coding structure was that the dialogue in the focus group was especially 
important to ensure that information from each discipline was coded in the right way. The presentation of the 
visualized sequencing through the BIM-model and the deliverables created a necessary foundation for the 
following dialogue between design, cost management and site management. This was seen in more open 
discussions between disciplines, enabled through the better understanding of the holistic view of the construction 
scope which is in line with previous research (Azhar, 2011; Sacks et al., 2018). The extra administration the 
respective disciplines experienced in the coding and its structure, is thus mitigated. 


By moving from static information to a digital information structure enabled information to dynamically be sorted 
and grouped in new ways to better suit the needs of different stakeholders. Through this common WBS and 
information structure, a common ground was created adding flexibility to each stakeholders’ specific needs, 
without affecting their basic needs. Thus, it could be argued that the process gave a better understanding how 
different parts and resources should work together to reach a better result in the project. 


Since the production of the case project, Hovås Tak, started in January 2023, the production planning and 
implementation of the first production stages has been conducted and evaluated. An initial reflection is that even 
though the information more closely followed how the construction was conducted, it was difficult for the site 
management and workers to absorb the information in this new way, especially in the BIM system used. With 
everyone used to read static drawings and descriptions; new working processes and tools had to be introduced on 
site. A first step to implement the common WBS structure in the site organization could be to visualize the 
information in each deliverable in a workshop form and discuss on how to use the information from each discipline. 
This enables an identification of affected stakeholders and make them understand the scope of the project while 
together developing the detailed job planning and construction schedule. Thus, utilizing the benefits of involving 
the right stakeholders in the planning and scheduling, as seen in Viklund Tallgren et al. (2021). 


The sequential division in different deliverables enables the construction team to focus on one thing at a time, thus 
ensuring that each deliverable reaches each respective goal, quality wise as well as budget wise and schedule wise. 
As the information is coded and structured in a way that easier enables the site management to sort and review 
information regardless of discipline, the site management is enabled to: 


e Clarify the coordination of material flows and logistics for each deliverable. 

e Clarify responsibilities connected to the project’s execution as well as coordination of sequencing of the 

project in general as well as during takt zones. 

Clarify cost flows and performance-based payment plans sorted by deliverables. 

Create a good basis for quality control within each deliverable. 

Create a good basis for inspection rounds and follow-up of work done for deliverable. 

Create good structure for all types of implementation statistics such as work environment, deviations from 

initial project scope and schedule. 

e Get a good structure for the collection of operation and maintenance data and a basis for as-built 
documentation. 


Furthermore, the deliverables enable a standardization and identification of repetitions that also can form the basis 
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for the takt planning, as highlighted in Haghsheno et al. (2016). In the case project, the deliverables were analyzed 
to build effective teams that work together to reach the common goal of the finished deliverable and where the 
construction scope was analyzed to create optimal takt trains. The common WBS structural hierarchy in 
deliverables and construction scope means that the information generated in the execution can be easily linked 
with the information sources (construction document, production estimates, schedules, and production planning 
such as purchasing and logistics). In a way the coding structure standardizes information expressed as missing in 
prior research (Cerezo-Narvaez et al., 2020; Makarfi Ibrahim et al., 2009). Furthermore, the information gap 
between project stages indicated in Borrmann (2018) can be avoided in with the use of shared coding structures as 
brought forward in this research. Project documentation, facility management documentation and O&M 
instructions that normally tend to be based on how it was planned to be built and function, can now reflect as built. 


Knowledge transfer with regards to finance, quality, productivity, becomes administratively simpler when all 
information is sorted with a shared structure and coordinated to get a better overall picture. The risk of some area 
being prioritized and the rest being suboptimized is then avoided. 


Also, by using a common WBS and BIM-model in this way enables data-driven pro-active decisions throughout 
a project. This could in the future assist the construction industry with meeting climate objectives, where informed 
data-driven decisions reduce waste and lower the overall climate impact in the projects. Implementing WBS 
through BIM, with the support of other digital technologies can improve circularity assessments, increase material 
recycling and reuse, and more accurately track environmental data throughout a building's lifecycle all the way to 
decommissioning and dismantling of building. 


Furthermore, this study and the developed WBS and information structure has contributed with input to the 
continued development of the Swedish classification system CoClass. 


5. CONCLUSIONS 


This paper presents a study of the design, development, and validation of a WBS and coding structure for 
supporting BIM and takt-planning in the context of a real construction project, Hovas Tak (case project). The aim 
has been to establish a shared information structure for all disciplines and investigate; what structure is needed for 
production control and management from design to construction and how this structure could help project 
management to understand the project execution and its challenges better. By structuring the WBS and its 
information in the BIM it was possible to support and mirror how work progress on-site. In this context, it was 
possible to connect the construction schedule, site management planning, the cost management, and the model of 
the project to a better holistic understanding. BIM enables the visualization of the step-by-step progression of the 
construction. Digitization and BIM enables detailed multidimensional WBS and coding, a shared code does not 
have to be governing, but a shared coding with lower common denominator can provide new interoperability 
opportunities between disciplines. Digitization, model-based construction makes the information less static and 
enables filtering and sorting BIM in different ways to describe different purposes. Furthermore, the common WBS 
and information structure is also an important base for a pro-active construction management and could be a base 
when it comes to takt-planning of the construction site and to gather information to the digital twin. 


6. ACKNOWLEDGEMENT 


This work is part of the Digital Twin Cities Centre and funded and supported by Sweden's Innovation Agency 
Vinnova under Grant No. 2019-00041 and by SBUF Grant No. 14237 (Development Fund of the Swedish 
Construction Industry) and CMB (Centre for Management of the Built Environment). 


REFERENCES 


Azhar, S. (2011). Building information modeling (BIM): Trends, benefits, risks, and challenges for the AEC 
industry. Leadership and Management in Engineering, 11(3), 241-252. 


Borrmann, A., König, M., Koch, C., & Beetz, J. (2018). Building information modeling: Why? What? How? In 
Building Information Modeling: Technology Foundations and Industry Practice. https://doi.org/10.1007/978-3- 
319-92862-3 1 


Cerezo-Narvaez, A., Pastor-Fernandez, A., Otero-Mateo, M., & Ballesteros-Pérez, P. (2020). Integration of cost 


583 


and work breakdown structures in the management of construction projects. Applied Sciences (Switzerland), 10(4). 
https://doi.org/10.3390/app 10041386 


Crowther, J., & Ajayi, S. O. (2019). Impacts of 4D BIM on construction project performance. International 
Journal of Construction Management, 0(0), 1—14. https://doi.org/10.1080/15623599.2019.1580832 


Disney, O., Johansson, M., Domenico Leto, A., Roupé, M., Sundquist, V., & Gustafsson, M. (2021). Total BIM 
Project: The future of a digital construction process. Industry 4.0 Applications for Full Lifecycle Integration of 
Buildings, 21-30. https://research.chalmers.se, 


Eckerberg, K., Edgar, J.-O., Engstrém, A., Lundgren, T., Onsbring, L., Térnkvist, M., Ténne, M., Ost, T., Bruhner, 
N., & Lundgren, A. (2016). CoClass och LOD Livscykeltest av CoClass-nya generationen BSAB. 


Garci1a-Fornieles, J. M., Fan, I.-S., Perez, A., Wainwright, C., & Sehdev, K. (2003). A Work Breakdown Structure 
that Integrates Different Views in Aircraft Modification Projects. Concurrent Engineering: Research and 
Applications, 11(1), 47-54. https://doi.org/10.1177/106329303032818 


Gaunt, M. (2017). BIM model-based design delivery: Tideway East, England, UK. Proceedings of the Institution 
of Civil Engineers - Smart Infrastructure and Construction, 1703), 50-58. 
https://doi.org/10.1680/jsmic.17.00011 


Haghsheno, S., Binninger, M., Dlouhy, J., & Sterlike, S. (2016). History and Theoretical Foundations of Takt 
Planning and Takt Control. Proceedings IGLC-24, 53—62. www.wissen.de 


Howard, R., Kiviniemi, A., & Samuelson, O. (2002, June). The latest developments in communications and e- 
commerce - IT barometer in 3 Nordic countries. Distributing Knowledge in Building, CIB W78 Conference. 
http://itc.scix.net/ 


Jung, Y., Asce, A. M., & Kang, S. (2007). Knowledge-Based Standard Progress Measurement for Integrated Cost 
and Schedule Performance Control. Journal of Construction Engineering and Mangement, 133(1), 10-21. 
https://doi.org/10.1061/ASCE0733-93642007133:110 


Leicht, D., Castro-Fresno, D., Diaz, J., & Baier, C. (2020). Multidimensional construction planning and agile 
organized project execution-The SD-PROMPT method. Sustainability (Switzerland), 12(16). 
https://do1.org/10.3390/SU12166340 


Makarfi Ibrahim, Y., Kaka, A., Aouad, G., & Kagioglou, M. (2009). Framework for a generic work breakdown 
structure for building projects. Construction Innovation, 9(4), 388-405. 
https://doi.org/10.1108/14714170910995930 


Mohd Nawi, M. N., Baluch, N., & Bahauddin, A. Y. (2014). Impact of fragmentation issue in construction 
industry: An overview. MATEC Web of Conferences, 15, 1—8. https://doi.org/10.1051/matecconf/20141501009 


Nepal, M. P., & Staub-French, S. (2016). Supporting knowledge-intensive construction management tasks in BIM. 
Journal of Information Technology in Construction, 21, 13-38. 


Sacks, R., Eastman, C., Lee, G., & Teicholz, P. (2018). BIM Handbook BIM Handbook Rafael Sacks Ird Edition. 
In John Wiley & Sons. 


Samuelson, O., & Björk, B. C. (2014). A longitudinal study of the adoption of IT technology in the Swedish 
building sector. Automation in Construction, 37, 182—190. https://doi.org/10.1016/j.autcon.2013.10.006 


Samuelson, O., & Stehn, L. (2023). Digital transformation in construction — a review. Journal of Information 
Technology in Construction, 28, 385—404. https://doi.org/10.36680/j.itcon.2023.020 


Viklund Tallgren, M., Johansson, M., Roupé, M., & Ljung, E. (2022). Developing Support for BIM-based Takt 
Time Schedules for Production Control. In C. Park, N. Dawood, F. Pour Rahimian, A. Pedro, & D. Lee (Eds.), 
The future of construction in the context of digital transformation and decarbonization - Proceedings of the 22nd 
International Conference on Construction Applications of Virtual Reality (pp. 723-730). 


Viklund Tallgren, M., Roupé, M., & Johansson, M. (2021). 4D modelling using virtual collaborative planning and 
scheduling. Journal of Information Technology in Construction (ITcon), 26(42), 763-782. 


584 


FCM-ENABLED APPROACH FOR INVESTIGATING 
INTERDEPENDENCIES OF BIM PERFORMANCE FACTORS IN THE 
SUSTAINABLE BUILT ENVIRONMENT 


Pavan Kumar, Aritra Pal, Yun-Tsui Chang & Shang-Hsien Hsieh 
Department of Civil Engineering, National Taiwan University, Taiwan 


ABSTRACT: In pursuit of a sustainable built environment, BIM plays a crucial role in the project's performance 
and has egressed as a powerful technology in the construction industry, impacting the outcome and the project 
delivery workflows. Numerous dynamic and interdependent factors influence BIM performance. However, Existing 
literature prominently focuses on exploring the influencing factors for BIM performance, ignoring the impact and 
strength of the interplay of these factors on one another, therefore offering an inadequate picture of optimizing 
BIM performance. The evolving nature and degree of complexity of construction projects necessitate the 
identification and comprehensive understanding of the interdependencies between factors contributing to BIM 
performance in the sustainable built environment. A Fuzzy Cognitive Map (FCM) is a modeling method that 
represents and analyses the interplay between the factors in a complex system. So, this study proposes an FCM- 
enabled approach to investigate the interdependencies of factors contributing to BIM performance and conduct 
what-if scenarios, including predictive analysis. The developed FCM model can help reveal the hidden cause- 
effect relationships among a complex system of BIM performance factors, enabling stakeholders to develop more 
informed strategies and proactively plan to optimize BIM Performance. 


KEYWORDS: BIM performance, Fuzzy Cognitive Map (FCM), Built environment 


1. INTRODUCTION 


Sustainable development has become the crucial state that all sectors aim to achieve, and the built environment is 
no exception. Sustainable development implies balancing human socioeconomic activities and the natural 
environment's capacity to provide resources and absorb waste on a global scale. In pursuing sustainable 
development, delivering construction projects with improved performance to achieve sustainable goals shall play 
a significant role. Studies in the literature have shown several factors influencing project performance. Chang et 
al. (2017) explored the factors that influence project performance and highlighted the technological aspects that 
significantly affect project performance, pushing stakeholders towards adopting the technology in the built 
environment. Building Information Modeling (BIM) has emerged as a promising digital technology in the AEC 
sector, enabling the ability to enhance performance in areas including design, procurement, prefabrication, 
construction, and post-construction(Wang et al., 2022). Although the BIM concept dates back to the 1970s, the 
adoption of the BIM was seen as significantly increasing since 2000 (Caglayan & Ozorhon, 2023). Effective 
adoption and continuous performance improvement of BIM requires maximizing the benefits and high exploitation 
of the capabilities of BIM, further pushing stakeholders to gain a holistic approach and a deep understanding of 
the dynamics of the influencing factors. Several studies have been conducted on BIM adoption and assessing its 
performance in various project phases, including design and construction. However, BIM performance is not an 
isolated aspect. Moreover, inefficiencies are caused by the influence exerted not by discrete factors but by the 
amalgamate impact of the combination of dynamically connected factors in construction projects. Therefore, 
identifying the causes of inefficiencies also becomes crucial (Zhang et al., 2021) to improve BIM performance. 
While existing studies have made momentous strides in identifying factors that influence BIM performance, there 
is a noticeable gap in understanding the dynamics between the factors whose influence propagates throughout the 
system of the construction project and eventually impacts the BIM performance. This study attempts to address 
this noticeable gap by proposing a Fuzzy Cognitive Map (FCM) model to explore the intricate mechanism of 
dynamic interconnections among the factors that influence BIM performance. The study aims to identify the 
factors (individually or in combination) causing inefficient BIM performance and provides a dynamic model that 
can be used to simulate the propagation of influence caused by policy modification through FCM theory. 


2. LITERATURE REVIEW 
2.1 BIM Influencing Factors 


BIM has been extensively studied in the research community, attributed to its positive impact on project 
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performance. Several studies have been looking into the factors that impact BIM adoption and its performance. 
These factors are often called critical success factors, or risk factors, or key performance indicators (Caglayan & 
Ozorhon, 2023). Rogers et al. (2015) investigated the factors driving BIM adoption for engineering consulting 
services (ECS) in the Malaysian construction industry. Inadequacies of BIM experts, guidance, and government 
support were found to be hurdles for the ECS. Lee et al. (2018) Propose an innovative trust-centric contracting 
model to enhance BIM performance within Engineering, procurement, and construction (EPC) projects and 
explain trust's positive influence on BIM performance. Caglayan & Ozorhon (2023) propose a framework to 
determine BIM effectiveness and identify the project, industry, and company-based factors influencing BIM 
effectiveness. Project and company-based factors were identified to govern the BIM effectiveness. Badrinath et al. 
(2018) propose an empirical methodology to explore the factors for successful BIM projects and identifies highly 
governing factors groups such as the BIM technology, stakeholder skills, and competencies. Several studies have 
examined BIM performance and attempted to explore the influencing factors to exploit the full potential of BIM. 
However, the studies in the literature rarely considered the system complexity and dynamics of the interactions 
among the factors, ignoring the causal propagation of impacts of any discrepancies of influencing factors. 
Furthermore, inefficiencies are caused by various degrees of influence by factors. Hence, identifying and 
eradicating those causes are significant in preventing inefficiencies (Zhang et al., 2021). This necessitates 
analyzing the dynamic relationships among factors influencing BIM performance. Luo et al. (2022) highlighted 
several static methods available to study the influencing factors, such as the Fuzzy analytical hierarchy process 
(FAHP) and Fuzzy analytic network process(FANP), However, these methods pay little to no attention to the 
dynamic interaction and complexity of the system. Hence, this study employs the FCM method to investigate the 
dynamic interactions among BIM performance factors and aims to identify factors causing inefficiencies in BIM 
performance. 


2.2 FCM Approach 


FCM was introduced by Kosko (1986) to model and simulate dynamics systems. FCM helps mimic a complex 
system by considering the causal relationships between the concepts (Poczeta et al., 2020). It is a powerful method 
that can simulate the interaction of factors. FCM enables the systematic propagation of causal relationships 
between the factors, hence a suitable and systematic decision approach for analyzing and deriving insights into 
complex system performance (Zhang et al., 2021). Luo et al. (2022) used FCM to explore the dynamic relationship 
between influencing factors and prefabricated building cost, further, employed FCM to identify the root causes 
and sensitivity of factors to conclude that the scale effect has the greatest effect on the prefabricated cost. Zhang 
et al. (2021) employed the FCM method to measure the Tunnel Boring Machine (TBM) performance and conduct 
root-cause analysis and what-if scenario to explore the dynamic relationship between the factors that influence the 
TBM performance. Case et al. (2018) examine the application of FCM in modeling construction management 
problems and project complexities and details the construction of FCM models for construction management 
problems. Luo et al., (2020) propose a novel hybrid approach that combines the structural equations model (SEM) 
and FCM to examine the impacts of discrepancy in project complexity on the project's success. Luo et al. (2022) 
compared the four typical methods that are used for the simulation of the interaction between the factors. 


3. METHODOLOGY 


The methodology of this research involve FCM-enabled predictive analysis for BIM performance and consist of 
identifying concepts, determinination of relationships, and FCM computation and analyzing, as described in the 
following subsections. 


3.1 FCM Development and Computation 
3.1.1 Identification of Concepts 


Identification of the concepts provides the basic structure for the FCM. FCM consists of several concepts, often 
called nodes or factors, referring to variables, elements, or attributes mimicking the various aspects of the system. 
Zhang et al. (2021) highlight the sources for identifying concepts such as accepted knowledge from the domain, 
empirical knowledge, and domain experts. Figure 1 illustrates a simple FCM model. These concepts (Ci) are 
connected by directed arcs, often called connections or edges, to represent the causal linkage between the concepts. 
Every concept in the FCM model bears a value ranging from 0 to 1 (Poczeta et al., 2020). 
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In this study, BIM performance influencing concepts are derived through a literature review and further reduced 
and added after a brainstorming session with experts. Finally, 18 BIM performance controlling concepts are 
finalized. Concepts include BIM knowledge of the project participants (C1), BIM training (C2), BIM motivation 
(C3), Consistent views on BIM between stakeholders (C4), Top management support/BIM leadership (Cs), Project 
delivery methods encouraging collaboration (Cs), Collaboration and communication (C7), Change management 
(Cs), Project complexity (Co), Availability of BIM guidelines/protocol (Cio), Provisions in contracts on data 
security (C11), Provisions in contracts on liability & risk (C12), Provisions of agreed standard of rules to protect the 
BIM employees (C13), BIM execution plan (C14), Hardware and software infrastructure (C15), The capability of 
hardware and software infrastructure (C16), Information Quality (Ci7), BIM experience of the stakeholder's 
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Fig.1. Illustration of simple FCM 


company (Cis) and BIM Performance (Cr). The descriptions of concepts are described in Table 1. 


Table 1. Concept identification for BIM Performance. 


ID Concept Description Reference 
BIM knowledge of the Competence of project participants in using BIM tools (Caglayan & Ozorhon, 
Ci 
project participants and methodologies. 2023) 
(Einur Azrin 
Training programs to develop BIM-related skills and 
C2 BIM training Baharuddin et al., 
knowledge among project participants. 
2019) 
Incentives and drivers for project participants to adopt 
C3 BIM motivation Suggested by expert 
and implement BIM. 
Consistent views on BIM Alignment and agreement among stakeholders on the (Al-Mohammad et al., 
C4 
between stakeholders goals and benefits of implementing BIM. 2023) 
Top management Endorsement and active support from top-level (Caglayan & Ozorhon, 
Cs 
support/BIM leadership management for successful BIM implementation. 2023) 
Project delivery method facilitating better collaboration 
Project delivery methods (Salim & Mahjoob, 
Co among stakeholders (for example, integrated project 
encouraging collaboration 2020) 
delivery). 
Collaboration and Effective collaboration and communication processes 
C7 (Oraee et al., 2019) 
communication among project participants using BIM. 
A systematic approach to managing and implementing 
Cs Change management Suggested by expert 
changes associated with BIM adoption. 
Co Project complexity Level of complexity and size of the construction project. (Jiang et al., 2021a) 
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Availability of BIM Internal guidelines/protocols for consistent BIM (Al-Mohammad et al., 


Cio 
guidelines/protocol implementation in the project. 2023) 
Provisions in contracts on Provisions in contracts addressing data security, (Al-Mohammad et al., 
Cu 
data security privacy, and confidentiality in BIM projects. 2023) 
Provisions in contracts addressing liability and risk 
Provisions in contracts on (Al-Mohammad et al., 
Ci2 allocation for stakeholders in relation to BIM 
liability & risk 2023) 
implementation. 
Provisions of agreed 
standard of rules to Agreed standards of rules protecting the rights and (Al-Mohammad et al., 
Ci3 
protect the BIM liabilities of individuals involved in BIM projects. 2023) 
employees 
A comprehensive plan outlining BIM implementation (Franz & Messner, 
Cı4 BIM execution plan 
strategy, processes, and deliverables. 2019a) 
Hardware and software Adequate hardware and software resources for BIM (Al-Mohammad et al., 
Cis 
infrastructure implementation. 2023) 
The capability of 
BIM software capability to meet project requirements, (Al-Mohammad et al., 


Cis hardware and software 
handle complex geometries, and more. (interoperability) 2023) 
infrastructure 


Accuracy, completeness, and reliability of information 
Cı7 Information quality (Song et al., 2017) 
within BIM models and datasets. 


BIM experience of the Level of experience and familiarity with BIM (Caglayan & Ozorhon, 
Cis 
stakeholder's company technologies and processes within the company. 2023) 


Effectiveness of BIM adoption and utilization in a 

project, aiming to maximize efficiency, return on (Caglayan & Ozorhon, 
Cr BIM Performance 

investment, and harness the full potential of BIM across 2023) 


all project phases. 


3.1.2 Identification of causal relationship and computation 


The concepts are linked by causal relationships. The direction of the causal relationship is represented by the 
connections or arcs describing the degree of influence between the concept Ci and Cj, often referred to as weights 
(Wi) (S. Lee et al., 2004) that can be positive or negative, with values ranging from -1 and +1. In the complex 
system, in the case of any variation in the state of C; results in a variation in the state of Cj, the arc is used to 
represent the causal relationship between Ci and Cj. Wij > 0 represents the increase or decrease in the Ci leads to 
an increase or decrease in Cj, respectively, while Wij< 0 represents an increase or decrease in Ci leads to a decrease 
or increase in Cj, respectively. Furthermore, if the Wi equals zero, it indicates the absence of a causal 
relationship(Maitra & Banerjee, 2014). Identification of the causal relationship is the key component of building 
a FCM. Luo et al. (2022) highlight the two approaches to determine the degree of causal influence between the 
concepts, such as the learning method, which demands a large number of historical data, and the expert method. 
This study employs expert methods and uses fuzzy semantics to describe the degree of causality among concepts 
using nine levels of fuzzy semantics such as negatively very strong, negatively strong, negatively moderate, 
negatively weak, neutral, positively weak, positively moderate, positively strong, positively very strong with 
membership vales as -1, -0.75, -0.50, -0.25, 0, 0.25, 0.50, 0.75 and 1 respectively. 


Causal interconnections of the FCM are mathematically presented by the n x n matrix(Zhang et al., 2021), and the 
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state value of concept Ci at the time t+1 can be obtained through the following equation (Stylios & Groumpos, 
2004) 


A(t +1) = f(Ai(t) + E- jei(Wji x 4jO)) (1) 


where A;(4 1) represents the state value of the concept of Ci at time t +1 as Ai(f) and Aj(A represents the state 
value of concept Ci and Cj at time t, respectively. 


Two types of threshold functions are employed in the FCM framework, i.e., sigmoid and hyperbolic tangent 
functions. 


JOUA +e) (2) 
Aix)=tanh(x)=((e"-e*)/(e"+e")) (3) 


Eq. (2) is employed to map values between 0 and 1, whereas Eq. (3) is employed to map the values between -1 
and +1 (Zhang et al., 2021). In dynamic FCM models, usually, the connection values range from -1 and 
1(Barbrook-Johnson & Penn, 2022). In this study, experts' responses on causal relationships are taken in the range 
of -1 and 1 to understand the dynamic interaction between the influencing BIM factors. Hence we choose the 
hyperbolic tangent function [Eq. (3)] 


Fig. 2. Fuzzy Cognitive Map of BIM Performance 


The responses of a totally 6 experts from India and Taiwan were received for the questionnaire and respondent 
background is shown in Table 2. The answers from the experts are further complied to calculate the weight for 
each concept by the center of gravity (COG) method. For example, Wir is aggregated to be 0.71 as three experts 
rated positively strong, two experts rated positively moderate, and another rated positively very strong. 


Table 2. Expert's background 


Respondent Professional background Educational background 
Expert 1 BIM Manager Master 

Expert 2 Researcher Doctor 

Expert 3 BIM Manager Master 

Expert 4 BIM Engineer Bachelor 

Expert 5 BIM Manager Master 

Expert 6 BIM Strategist Doctor 
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There are several tools to facilitate the development of the FCM model. Napoles et al. (2018) identified and 
compared several FCM tools and proposed a tool called FCM Expert, which facilitates adequate graphical support 
and higher experimental options. In this study, FCM Expert tool is employed, considering its advantages of FCM 
Expert over other tools. Creating the concepts and assigning aggregated weights to the concepts, results in the 
FCM model of BIM performance that is graphically presented in Fig. 2. FCM facilitates to simulate the behavior 
of systems and enables the what-if experiments (Papageorgiou & Salmeron, 2013). In order to identify the concepts 
influencing BIM performance, considering the dynamic interplay among the concepts, FCM, once modeled, allows 
the predictive analysis. 


3.2 Predictive Analysis and Discussion 


FCM-enabled predictive analysis facilitates the forecast of the outcome or impact of a cause when information or 
evidence of the cause is available (Zhang et al., 2021). It allows to curate the experiments involving simulation of 
impacts of the target variable (Cr) when evidence of change is available in the influencing variables (Luo et al., 
2022). To address the uncertain nature of the concepts, a five-point linguistic scale: very favorable, favorable, 
neutral, unfavorable, and very unfavorable, with numerical values as 1, 0.5, 0, -0.5, and -1, respectively. The 
dynamic propagation of the effect of change in nature (very favorable, favorable, neutral, unfavorable, and very 
unfavorable) of influencing concepts (Cı to Cis) on CT in a system of network simulated and the effect of the 
target variable (Cr), i.e., BIM Performance is observed. For example, the initial nature of all the concepts is set to 
neutral except Co, whose nature is assumed to be very favorable, favorable, unfavorable, or very unfavorable, to 
observe the effect on BIM Performance (Cr) until it stabilizes. Table 3. Presents the stable values of CT after a set 
of iterations in different values (1,0.5, -0.5,1) of Cı to Cis. For example, the stable value of Cr when C; is very 
favorable (=1) is 0.948, implying a positive correlation between Cr and C1. All the concepts except Co tend to have 
a positive correlation with the target variable, i.e., BIM Performance. The stable value of Cr when Co is very 
unfavorable (=-1) is 0.951, implying the negative correlation between them. Fig.3 illustrates the impact of the 
concepts C; and Cy on Cr. Predictive analysis results showed that concepts C; (BIM knowledge of the project 
participants), C2 (BIM training), Co (Project complexity), and C14 (BIM execution plan) have a high influence on 
BIM performance. Similar results are demonstrated by other studies, such as Caglayan & Ozorhon, (2023) 
demonstrate that project-based factors such as BIM training and BIM knowledge of the individuals on the BIM 
project have a direct impact and great influence on the effectiveness of the BIM. Project complexity (Co) has a 
significant impact on BIM performance. The negative correlation here implies poor BIM performance resulting 
from the combination of the influence of BIM knowledge and training. Jiang et al. (2021b) study BIM performance, 
project complexity, and user satisfaction and highlight project complexity as the key factor for BIM performance 
and user satisfaction. Furthermore, Franz & Messner, (2019b) shows the positive impact of the BIM execution 
plan, not only on participating members but also the performance. Similarly, the results of predictive analysis tend 
to show the positive influence of the BIM execution plan (C14) on BIM performance. The predictive analysis aids 
the deeper understanding in enhancing the effectiveness of BIM adoption and aiming to use the full potential of 
BIM. 


1.0 SSS eee eeeaeeeaeeeeaaeeaneani 1.04 VVVV VV 
0.8 SSSSSSHSSSSCOSSCOCSCSSCSCSCSESCOSC ös] WerereerTereryTy TTT Ty) 
w 8-4 
06 f o6] fi 
d j 
0.4 d aed "a 
Da = P(CTIC1=1) | va 
ae os © P(CTIC1=0.5) oF ya = P(CTIC9=1) 
© 00 a P(CTIC1=-0.5) D oo pezzi” = 2 sg 
ary Y P(CTIC1=-1) an hi: a— P(CT|C9=-0.5) 
-0.2 s -0.24 RN v P(CT|C9=-1) 
-0.4 T RPE x 
Y .? 
-0.6 A -0.64 
v J è 
-0.8 AAABAADAAAAAAAAAAAAAAAAAL 0.84 s 
VV VVVVVVVVVVVYYYVYYYVY¥VYY¥) ] LEP RED SARS CHRS ESSE 
-1.0 Beccccnconnoncennnnnnn! 


0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 
Time Time 


Fig.3. Impact of concepts Cı (BIM knowledge of the project participants), Co (Project complexity) on Cr (BIM 
Performance) 
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Table 3. Stable values of Crunder different scenarios 


Concept ID Ci=1 Ci=0.5 Ci = -0.5 Ci=-1 
Cı 0.948 0.823 -0.823 -0.948 
C2 0.942 0.839 -0.839 -0.942 
C3 0.811 0.621 -0.621 -0.811 
Cy 0.830 0.682 -0.682 -0.830 
Cs 0.835 0.689 -0.689 -0.835 
Cs 0.890 0.747 -0.747 -0.890 
(67 0.911 0.793 -0.793 -0.911 
Cs 0.942 0.808 -0.808 -0.942 
Co -0.951 -0.832 0.832 0.951 
Cio 0.892 0.763 -0.763 -0.892 
Cii 0.780 0.611 -0.611 -0.780 
Cio 0.690 0.609 -0.609 -0.690 
Cis 0.780 0.619 -0.619 -0.780 
Ci4 0.979 0.900 -0.900 -0.979 
Cis 0.849 0.721 -0.721 -0.849 
Cie 0.908 0.790 -0.790 -0.908 
Ci 0.940 0.807 -0.807 -0.940 
Cis 0.903 0.788 -0.788 -0.903 


4. CONCLUSION 


BIM adoption and enhancing its performance in construction projects is a complex system where several factors 
influence its effectiveness in exploiting the high potential of BIM. It is crucial to pinpoint the factors causing the 
strong influence on BIM performance, considering the dynamic relationship among them. FCM models are better 
suited to explore and reflect the cause-effect relationship among the concepts (Luo et al., 2022). FCM's approach 
to understanding the dynamic relationship between factors that influence BIM performance is suitable due to its 
dynamic complexity. Furthermore, it allows predictive analysis to forecast the behavior of the network of the 
system. The concepts for the BIM performance were identified from the literature and further filtered through 
brainstorming sessions with experts to finalize 18 concepts. Relationships between the concepts were captured 
through a survey, and FCM was developed to enable predictive analysis. The results showed a high positive 
influence from the concepts: BIM knowledge of the project participants (Cı), BIM training (C2), and BIM 
execution plan (C14) have a big influence on BIM performance, whereas Project complexity (Co) tend to show the 
negative correlation implying the special precautions to be taken by stakeholders to enhance the effectiveness of 
the BIM adoption to leverage the full potential of BIM in high complexity in the construction project. 


In the course of this study, highlighting encountered limitations shall aid the improvement in future studies. The 
factors identified for the study are pivotal for exploring the dynamic interaction among the BIM performance 
influencing factors and are not extensive. Additionally, the reliability of the FCM employed could be strengthened 
with high responses and the widespread input of experts. 


The exploration of the relationships among the factors is enabled through several methods, as mentioned before. 
However, understanding the dynamic relationship among them and the propagation of ripple effects among the 
network system is better understood through adopting less static models like FAEM and FMEA. In order to capture 
the dynamic complexity, the systems thinking approach can be used to understand the behavior of the system of 
BIM and assess BIM's performance with higher experts' involvement. To date, several construction activities are 
highly dependent on the experience of the experts; the FCM can be used to capture crucial human knowledge in 
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the context of effective BIM adoption to aid better decision-making for the stakeholders involved. 
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ABSTRACT: Currently, industrial robot arms are trending in prefabricated building construction; however, a 
notable gap exists in established automated processes and related research specifically for the insertion of batt 
thermal insulation. The current method for accomplishing this task relies on manual insertion, which is labour- 
intensive for the workers and poses long-term health and safety concerns. This research presents an ongoing 
research project aimed at developing a feasible robotic process for the automated insertion of batt thermal 
insulation into prefabricated light-frame wood wall frames. This research focuses on the utilization of a single 6- 
degree-of-freedom robot arm for the insertion process, complimented by the design of a custom-built end-effector. 
The proposed robotic insertion process, named GLITPP, comprises of six major steps: (1) Grasp, (2) Lift, (3) Insert, 
(4) Tilt, (5) Push, and (6) Press. The GLITPP insertion process, along with the custom-built end-effector effectively 
mitigates the influence of the insulation 5 nonlinear mechanical properties, while also taking collision avoidance 
into consideration. This ensures a tight-fitting insulation within the frame cavity, without visible gaps and 
deficiencies. The necessary physical operating parameters for the insertion process, such as angles, offset, and 
force requirements, are identified to ensure the precision, efficiency, and repeatability of insertion. A prototype of 
the designed end-effector is used to demonstrate and validate the robotic method, achieved a high success rate of 
93.3%. The development of this research will further advance the complete automation of light-frame wood wall 
panel prefabrication, offering the industry a wider range of options for selecting thermal insulation for their 
processes. 


KEYWORDS: Robotic Building Prefabrication; Robotic Insertion; Light-frame Wood Construction; Robotic End- 
effector; Automation in Construction; Thermal Insulation 


1. INTRODUCTION 


Industrial robot arm technology is increasingly being demonstrated and utilized in building construction processes 
due to its cost effectiveness, ease of programmability, high accuracy, and capacity (Chai et al., 2022; Koerner-Al- 
Rawi, Park, Phillips, Pickoff, & Tortorici, 2020; Leung, Apolinarska, Tanadini, Gramazio, & Kohler, 2021). Robot 
arms support mass production due to their high efficiency in performing repetitive tasks and their scalability, 
making them particularly suitable for the construction of prefabricated modular light-frame wood wall panels. 
These panels are prefabricated offsite, incorporating essential building elements such as framing, insulation, and 
sheathing. The use of robotic arms for cutting, assembling, and nailing timber for framing and sheathing is already 
well established and utilized (Stricot-Tarboton, 2019). 


Various insulating materials are available on the market for timber framed structures, but the common types include 
blow-in insulation, spray foam insulation, and batt thermal insulation, made from fiberglass or mineral wool. 
Among these, batt thermal insulation holds a dominant market position due to their low cost to effective thermal 
performance ratio (Latif, Bevan, & Woolley, 2019). Currently, the automated insulation installation solution used 
to support the construction of modular light-frame wood wall panels are blow-in insulation (Orlowski, 2020) and 
spray-foam ("SprayWorks Equipment” n.d.; “Spray-R” n.d.), due to their loose form and ability to easily conform 
to voids. However, batt thermal insulation still needs to be manually installed by workers in the light-frame wood 
wall panel construction process (Stricot-Tarboton, 2019), which hinders achieving a fully automated construction 
process using this type of insulation. 


The mechanical behaviors of batt insulation made of either mineral wool or fiberglass are classified as semi-rigid 
and non-rigid, respectively. In terms of mechanical properties, both these insulations are considered anisotropic 
deformable materials due to random fiber orientation. The result is that the rigid body assumption cannot be 
employed. In addition, predicting deformation and deflection of insulation using classical approaches is 
insufficient. Furthermore, the relationship between stress and strain is nonlinear thus complicating the modelling 
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process which has downstream effects on simulation and analysis. To the best of our knowledge, there has been 
no formal study or research on the robotic process of inserting anisotropic deformable material into a shallow 
cavity for a tight-fit. Manipulating deformable objects presents challenges as classical analytical force relationships 
are no longer applicable. Consequently, predicting and controlling material behavior during manipulation requires 
complex contact analysis and modeling of nonlinear material behavior (Arriola-Rios et al., 2020; Zaidi, Corrales, 
Bouzgarrou, Mezouar, & Sabourin, 2017). 


The successful development of this research will allow for direct integration into existing prefabricated modular 
light-frame wood wall panels construction processes to realize full automation or prefabricated construction using 
batt thermal insulation. The implementation of this automated process offers several benefits, including the 
elimination of risks to workers from developing chronic respiratory diseases due to exposure to airborne dust and 
fibers, as well as reducing their exposure to carcinogens and volatile organic compounds. Additionally, it will 
decrease the ergonomic risk arising from repetitive movements during manual insulation installation. (BREUM, 
SCHNEIDER, JORGENSEN, VALDBJORN RASMUSSEN, & SKIBSTRUP ERIKSEN, 2003; Kupczewska- 
Dobecka, Konieczko, & Czerczak, 2020; Li, Han, Gül, & Al-Hussein, 2019). 


This paper is structured into six sections. Section 2 introduces the research objectives. In section 3, the design 
concept of the robotic end-effector and robotic insertion process are explained. Section 4 covers the 
implementation and parameters identification. Following that, Section 5 presents the outcomes of the conducted 
experiments, providing an in-depth analysis of the results. Lastly, Section 6 presents our conclusions and outlines 
potential avenues for future research and development. 


2. RESEARCH OBJECTIVES 


This research aims to develop a robotic method to automatically insert batt thermal insulation into a light-frame 
wood wall frame. Studies have been conducted to explore the process of utilizing industrial robot arms to insulate 
wall panels through blow-in insulation and spray foam insulation methods. However, the utilization of robotic 
methods for inserting batt insulation has been rarely discussed. When implementing the robot arm for batt 
insulation insertion, the most critical challenges are effectively manipulating deformable materials, along with 
ensuring collision-free trajectories for the robot arm. To achieve the goal of automatically inserting batt thermal 
insulation using robotic methods, the objectives of this research will focus on the following: 


1. Designing a robotic end-effector capable of proficiently manipulating batt thermal insulation, while 
considering its geometrical properties and deformable material behaviors. 

2. Developing a robotic process for inserting batt thermal insulation into light-frame cavities using the proposed 
robotic end-effector to mitigate the influence of the deformation uncertainties and nonlinear mechanical 
properties. 

3. Identifying variable parameters (e.g., angles, location, forces, etc.) of the developed robotic process to ensure 
the integrity and success of insertion and to minimize potential collisions. 


3. ROBOTIC END-EFFECTOR & INSERTION PROCESS DESIGN 


The developed robotic method contains two major parts: the robotic end-effector and the robotic insertion process. 
First, to facilitate the pickup and manipulate batt insulation to correct positions, a dedicated robotic end-effector 
was designed. Then, a robotic insertion process with six major steps was developed to effectively insert the batt 
insulation into the light-frame. 


3.1 Assumptions 
The developed robotic insertion process is based on the following three assumptions: 


1. The light-frame wood wall is a typical straight wall without any piping or wires within the cavity. 

2. The wood stud utilized in the frame is in a favorable condition, with negligible warping. 

3. The working platform features a smooth surface, and the friction between the insulation and the working 
platform is considered negligible. 


3.2 Robotic End-effector Design 


Illustrated in Figure 1, the end-effector comprises four components: a two-finger gripper, a force-torque sensor, an 
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adaptor, and a pair of gripping jaws. The two-finger gripper with linear stroke is an off-the-shelf product. For 
standard batt insulation, its width is larger than the width of the cavity to achieve a tight-fit once inserted. These 
dimensional differences require the robot arm to apply compressive force to insert the batt insulation into the cavity. 
Therefore, a force-torque sensor that allows control of the applied force is mounted atop of the two-finger gripper. 
The gripping jaw adaptors are designed to connect the off-the-shelf gripper to the gripping jaws. The adaptors are 
CNC-machined L-shaped steel braces. The last component of the end-effector is the custom-built gripping jaws. 
The surface inside the jaw is textured to increase friction, with all else being equal reduces the grasping force, 
between the jaw and the insulation. The sizing of the jaw should be determined considering the dimensions and 
weight of the insulation, as well as its degree of elasticity. The jaw dimension is crucial to minimize the permanent 


deformation during manipulation. 


— 


Open Close 


Figure 1. The proposed robotic end-effector. 
3.3 Robotic Insertion Process Design 


The proposed process for inserting batt insulation in this section is inspired by both the manual insertion process 
and the research conducted by Kim & Seo (2019). The research focused on the insertion of rigid objects into 
shallow cavities, incorporating primitive operations such as grasping, tilting, and tucking. In the context of batt 
insulation insertion, the proposed robotic insertion process, coined as GLITPP, contains six major steps: (1) Grasp, 
(2) Lift, (3) Insert, (4) Tilt, (5) Push, and (6) Press (as illustrated in Figure 2). 


r : 7 i J fia J 


(1) Grasp (2) Lift (3) Insert 


d : Insertion offset 
a Insertion Angle 


ð : Tilting Angle 


L [ I Press Spacin; 
LELER EFi PITT Aptis 
Frye: Pressing Force 
(4) Tilt (5) Push (6) Press 


Figure 2. The GLITPP insertion process. 
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The initial step is to grasp the insulation from an initialized position. The robot arm moves to the top of the 
insulation using a point-to-point motion (PTP motion). Subsequently, the robot arm descends linearly (LIN motion) 
until the gripper reaches the insulation’s top edge. Upon reaching the edge, the gripper closes to securely grasp the 
insulation. The second step is lifting. After grasping the insulation, a LIN motion is utilized to lift the insulation 
linearly. This is followed by a PTP motion to position the insulation in proximity of the frame cavity. In the third 
step, a combination of PTP and LIN motions is programmed to insert one edge of the insulation into the frame 
cavity with consideration of an insertion angle (a) and offset (d). For the fourth step, the end-effector is tilted. 
This ensures that when the grippers open, they avoid collisions with the frame and release the insulation without 
shearing it. Achieving this involves a PTP motion, tilting the end-effector to a specific angle (@), and then releasing 
the insulation. To align the insulation accurately with the frame, pushing operations are employed along the 
uninserted edges. The robot arm utilizes LIN motions to enable the gripping jaws to gently push the insulation 
until it aligns flush with the cavity. For the final step, the gripping jaws are employed to press the uninserted edges 
of the insulation into the frame cavity, using defined press spacing (l) and pressing force (Fpress) parameters. The 
pressing pattern initiates from the corners of the inserted edge and proceeds along the edges perpendicular to the 
inserted edge and then finally along the parallel uninserted edge. The task description and associated robot motions 
outlined above are summarized in Table 1. 


Table 1. Six steps of the GLITPP insertion process. 


Steps Task Description Related robot motions 
Grasp From the initial position, the insulation is securely . PTP motion to the top of the insulation. 
grasped by robotic gripper for pick up and . LIN motion down till the gripper reaches the 
manipulating. insulation’s edge. 
. Close the gripper. 
Lift The robot arm picks up the insulation and moves it . LIN motion to lift the insulation. 
close to the frame cavity. . PTP motion to move the insulation close the frame 
cavity. 
Insert The robot arm inserts the insulation into the cavity . PTP motion to rotate the insulation. 
with an insertion angle (a) and offset (d). . LIN motion to insert the insulation into the cavity. 
Tilt The robot arm tilts the insulation to a tilting angle . PTP motion to tilt the insulation 
(9). 
Push The robot arm uses its gripping jaw to push the . LIN motion down till tip of the gripping jaw reaches 
insulation along the uninserted edges of the the bottom of the insulation. 
insulation until it is flushed with the cavity. . LIN motions parallel to the frame’s direction to push 
the insulation into the cavity. 
Press The robot arm uses its gripping jaws to press the . PTP motion to the pressing location. 


insulation into the frame cavity. 


. LIN motions descend till the force reaches the 


pressing force. 


. LIN motion up. 
4. LIN motion to the section pressing location 
. Repeat step 2 to 4 until the insulation is inserted. 


4. IMPLEMENTATION & PARAMETER INDETIFICATIONS 
4.1 Prototype of the End-effector 


For the implementation, this research developed a prototype of the robotic end-effector (Figure 3). Detailed 
specifications of the end-effector components are listed in Table 2. The Robotiq FT 300-S Force Torque sensor, 
capable of measuring forces up to +300 N and offering a data output of 100 Hz was utilized. Additionally, the 
Robotiq Hand-e gripper, with a maximum stroke length of 2 inches, was selected to serve as the parallel gripper. 
The custom adaptor was designed, in accordance with the Hand-e gripper specifications, to incorporate M3 screws 
for attachment onto the gripper’s tracks. The gripping jaws, with dimensions of 4 inches by 4.5 inches and a 
thickness of 3/16 inch, were 3D printed with a textured interior surface at the lower section. The final assembly of 
the end-effector provides a clearance of 3.75 inches when the gripper is open and reduces to 1.75 inches when 
closed. 
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Figure 3. The prototyped robotic end-effector at Fully open position (left) and at fully closed position (right). 


Table 2. Specifications for the prototyped end-effector. 


Part Manufacturer Model Specifications 


Force torque sensor Robotiq FT 300-S e Measuring range: +300 N 
e Data output rate:100 Hz 


Two-finger gripper Robotiq Hand-e e Stroke: 2 in 
e Form-fit grip payload: 11 Ibs 
e Friction grip payload: 8.8 lbs 
e Weight: 2 lbs 
e Gripping force: 20 to 185N 


Adaptor Custom-built - e Material: CNC machined steel 


Gripping jaws Custom-built - e Material: 3D printed PLA 
e Size: 4-inch (W) x 4.5-inch (L)  3/16-inch (T) 


4.2 In-lab Robotic Cell Setup 


The in-lab robotic cell setup is illustrated in Figure 4. A Universal Robot URSe was utilized as the robotic 
manipulator. The URSe is a robot arm with 6 degree-of-freedoms. Its operational capacity is 11 lbs for payloads 
and accompanied by a maximum reach span of 33.5 inches. The robot arm was mounted on 46 inches by 34 inches 
table platform. The platform’s upper surface is constructed from plastic, providing a smooth texture that minimizes 
friction effects between the insulation and the surface. Ultimately, the prototyped end-effector was affixed to the 
6" axis of URSe. 


Figure 4. The in-lab robotic cell setup. 
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The light-frame wood wall utilized in this research is constructed using four 2 inches by 4 inches SPF Dimensional 
Lumbers. This lumber dimension is representative of a common type of light-frame wood wall used in building 
construction. The spacing between wood studs is set at 16 inches on-center, a standard stud spacing commonly 
employed in building construction, which results in a cavity width of 14.5 inches. Due to hardware constraints, 
specifically the reach of the robotic arm, the height of the wall cavity has been scaled to 26 inches, instead of the 
typical 8-foot wall. The overall dimensions of the wood cavity are 14.5 inches in width, 26 inches in height, and 
3.5 inches in depth. 


4.3 Parameters Identification 


The GLITPP insertion process involves five key parameters: insertion angle (a), insertion offset (d), tilt angle (0), 
press spacing (l) and pressing force (Fp;e;;). These parameter values are determined through a series of individual 
robotic trials. All the trials are tested on 26 inches of mineral wool insulation. Once a set of feasible parameter 
values are ascertained, they are integrated to illustrate and formalize the entire robotic insertion process, the results 
of which will be presented in Section 5. 


4.3.1 Insertion angle & Insertion offset & Tilt angle 


There exists an interdependency among the insertion angle (a), insertion offset (d), and tilt angle (0). Changes in 
the values of one parameter affect the other, as demonstrated by Eq.1. This equation represents an affine function 
that defines all points (x, y) on the outer surface of the fully opened gripper jaw in its tilted position. Notably, the 
parameter values must satisfy the condition that the affine function (hyperplane) is positioned to the left of the 
boundary point (P). This arrangement ensures collisions-free between the frame and the gripper. Figure 5 illustrates 
the variables used in Eq.1: L represents the distance from the tool center point (TCP) to the end of the insulation, 
H is the wood stud width, which is equal to the insulation thickness, W denotes the frame cavity width, E stands 
for the distance from the TCP to end of the gripper, G represents the distance between the center line and the outer 
surface of the gripper jaw in its fully open position, and P corresponds to the inner edge of the wood stud, serving 
as the boundary point. 


y = (2x * sin (0) — 2L x sin (80 — a) + H x cos (0 —a) — 2 * d x sin (0) — 2G)/(2 cos (8) ) (1) 


Figure 5. Illustration of the variables in Eq.1. 


Due to the texturing on the gripper, a tilt operation is employed to facilitate the release of insulation from the 
gripper jaws after insertion into the frame cavity. This approach reduces disruptions to the insulation’s intended 
position, prevents damage to its fibers by eliminating shearing, and allows for maintaining a minimal insertion 
angle. Thus, obtaining the minimum tilt angle is essential for achieving the effective release of insulation. The 
determination of this minimum tilt angle then allows for the calculation of the insertion angle and offset. Moreover, 
tilting indirectly contributes to the insertion process by pressing more edges of the insulation into the frame cavity. 
The minimun tilt angle is determined to be 55°, factoring in the insulation’s weight and frictional forces. A list of 
trialed tilt angles and their corresponding outcomes are presented in Table 3. 


Table 3. A list of trialed tilt angles with success rate. 


0 (°) Success Rate Failure Mode 
30 0/5 Insulations were pulled out of the frame cavity. 
45 0/5 Insulations were first pulled, followed by shearing of the insulation fibers. 
55 5/5 N/A 
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The insertion offset is the distance between the insulation and the frame’s edge. When the offset is either too small 
or too large, the risk of collision increases during the insertion or pressing steps, respectively. This necessitates 
finding a balance between within the feasible range of offsets. Regarding the insertion angle, a smaller angle results 
in a longer edge of the insulation being inserted into the frame cavity. This minimizes the likelihood for the 
insulation edges to catch or snag during the subsequent titling and pressing steps, thus, ensuring proper seating of 
the insulation within the frame cavity. The lower limit of the insertion angle depends on the tilt angle in order to 
prevent collisions after titling when the grippers open to release the insulation. The insertion angle for a specific 
insertion offset, utilizing the minimum tilt angle obtained earlier, can be determined from Figure 6. This figure is 
generated by computing the solution pairs that satisfy Eq.1. Three combinations were selected for testing and the 
results are presented in Table 4. The tested minimum feasible combination of insertion angle and insertion offset 
is achieved at 30° with an offset of 0.75 inches. 


37 


a (°) 


i d (in) 


Figure 6. Combination of insertion angle and insertion offset. 


Table 4. A list of trialed combination of insertion angle and insertion offset with success rate. 


a (°) d (in) Success Rate Failure Mode 
29 0.25 0/5 Collision warning during insertion due to excessive compression of insulation. 
29.5 0.5 0/5 Collision warning during insertion due to excessive compression of insulation. 
30 0.75 5/5 N/A 


4.3.2 Press spacing 


The press spacing signifies the positions where the gripper will apply pressure along the uninserted edge of the 
insulation, facilitating its complete insertion into the frame cavity while ensuring no insulation edges remain 
exposed. While there are no explicit limitations on the quantity of presses, minimizing press count contributes to 
time efficiency. In the conducted trials, the center-to-center distance of the press spacing varies from 15 inches to 
5 inches, with the initial press initiated from a corner. The outcomes of the trials are compiled in Table 5. It was 
noted that a 5-inch press spacing effectively accomplishes the insulation’s insertion into the frame cavity, without 
any conspicuous convexity. 


Table 5. A list of trialed press spacing with success rate. 


l (in) Success Rate Failure Mode 
15 0/5 Evident convexity noticeable between each press point. 
10 3/5 Evident convexity noticeable between certain press points. 
5 5/5 N/A 
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4.3.3 Pressing force 


Once the press spacing is determined, it becomes crucial to apply the minimum amount of force necessary to press 
the insulation into the frame cavity. This approach ensures that the insulation avoids permanent deformation, which 
could lead to a loss of effective R-value. The lower and upper bounds for the pressing force are 80N and 120N, 
respectively. The results of the trials are listed in Table 6. It was determined that a pressing force of 100N fully 
and reliably presses the insulation into the frame cavity each time. 


Table 6. A list of trialed pressing force with success rate. 


Fpress N) Success Rate Failure Mode 
80 2/5 There were instances that the insulation was not fully pressed in. 
100 5/5 N/A 
120 4/5 There was an instance where the insulation had permanent deformation. 


5. VALIDATION 


The designed GLITPP insertion process, using the parameters obtained above as shown in Table 7, was tested with 
mineral wool and fiberglass batt insulations in two distinct scenarios that represent the actual configurations of 
insulation installation in construction: (1) a single piece of insulation filing the entire frame cavity, and (2) two 
pieces of insulation planed in tandem within the frame cavity. In Scenario 2, the process involved two steps: first 
inserting the 20-inch piece and then 6-inch piece to fill the entire frame cavity. During the insertion of the 6-inch 
piece, an additional 1-inch offset was applied between the two insulations to avoid the interaction of large frictional 
forces between the mating surfaces. For Scenario 1, ten tests were conducted for each insulation type. For Scenario 
2, ten tests were conducted for each step and each insulation type. Each test was performed using new insulation 
to simulate the actual application in construction. 


Table 7. Summarized selected parameter values obtained from Section 4.3. 


Parameter Value 
Tilt Angle (0) 55° 
Insertion Angle (a) 30° 
Insertion Offset (d) 0.75 in 
Press Spacing (L) 5 in 
Pressing Force (Fpyess) 100 N 


5.1 Results and Discussion 


Table 8 summarizes the experiment results using the selected parameters obtained in Section 4.3. Fig. 8 shows the 
progress of all the experiments and final insertion completion with the front and back of the insulation. The success 
of the entire GLITPP insertion process is defined by the insulation fitting tightly within the frame cavity, the 
absence of visible voids and gaps between the insulation and wood frame, and the insulation having no shearing 
or permanent deformation. The overall success rate stands at 93.3%. 


Table 8. Experiment results for entire GLITPP insertion process. 


Scenario Insulation Length (in) Batt Insulation Type Success Rate 
1 26 Mineral Wool 9/10 
1 26 Fiberglass 10/10 
2.1 20 Mineral Wool 10/10 
2.1 20 Fiberglass 9/10 
2:2, 6 Mineral Wool 8/10 
2.2 6 Fiberglass 10/10 
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Grasp Lift Insert Tilt Push Press Front Back 


2.1 


2.1 


2.2 


2.2 


Figure 8. Experiment progress. Each row is the scenario associated with the corresponding rows of the Table 7. 


The high success rates, as showed in Table 8, were achieved by mitigating the negative effects of deformation and 
uncertainties in mechanical properties through the GLITPP insertion process. The integration of individual 
parameter identification into a single continuous process, facilitated by custom-built grippers, is seamless and 
repeatable. The results substantiate that the manipulation of deformable insulation using the designed grasping, 
lifting, and inserting steps yields high accuracy in achieving target positions. The tilting step reduces the risk of 
shearing of the insulation during release, while the pushing step offers guidance that minimizes uncertainties and 
random disturbances before pressing. Ultimately, the pressing step ensures a tight-fitted insulation within the frame 
cavity, without any discernible gaps and deficiencies. A notable feature of the GLITPP process is that it can be 
extended to full-scaled 2x4 light-frame wood wall panels and light-frame wood wall panels with varying wood 
stud sizes, achieved by employing different sizes/numbers of grippers and tuning of parameters. 


There were four instances in which the GLITPP process did not succeed. In scenario 2.1, the failure was an outlier, 
as no defects were observed in the insulation. For scenarios 1 and 2.2, the lack of success resulted from pre-existing 
creases and pockets of low-density in the insulation. Notably, the insulations used in validation were all chosen 
randomly from the packaging without any rejection of unideal pieces of insulation. Incorporating insulation pre- 
inspection, selection, and rejection would raise the success rate. 
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6. CONCLUSION 


This research introduces a robotic method to insert batt thermal insulation into light-frame wood wall frame. The 
method comprises two major components: a custom-built end-effector and a corresponding robotic insertion 
process. The design of the robotic end-effector integrates seamlessly with an off-the-shelf two-finger gripper. The 
end-effector is constructed of a force-torque sensor, a two-finger gripper, an adaptor, and a pair of gripping jaws. 
The proposed robotic insertion process, named GLITPP, encompasses a sequence of six major steps: Grasp, Lift, 
Insert, Tilt, Push, and Press. To identify the variable parameters within the GLITPP insertion process, an in-lab 
robotic cell equipped with a prototyped end-effector was utilized. Through a series of individual robotic trials and 
iterative refinements, these parameter values were determined. The effectiveness and feasibility of the proposed 
robotic method were evaluated using two common batt thermal insulations: mineral wool and fiberglass. Test 
scenarios encompassed both a single insulation piece filling the entire frame cavity and the tandem placement of 
two insulation pieces within the cavity. The results exhibited a remarkable 93.3% success rate for the GLITPP 
insertion process. To ensure the broader applicability of the proposed method, future works will involve testing 
the GLITPP insertion process on a larger capacity robot arm with full-scaled panels. Additionally, the integration 
of an insulation condition identifying sensor is envisioned, enhancing adaptability by combining it with our robotic 
insertion process. 
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THE IMPACTS OF DIGITAL FABRICATION ON THE CONSTRUCTION 
INDUSTRY: A SYSTEMATIC REVIEW 


Mehdi Keshtkar, Emmanuel Daniel & Louis Gyoh 
School of Architecture and Built Environment, University of Wolverhampton, United Kingdom 


ABSTRACT: The building industry is a major consumer of natural resources and a large contributor to 
environmental degradation, leading to a need to rethink current building practices. Digital fabrication (Dfab) 
technologies, which transform design and engineering data into physical products, are gaining traction in the 
Architecture, Engineering and Construction (AEC) industry. This study aimed to evaluate the implications of 
digital fabrication in the construction industry, by identifying the current Dfab applications and the hindrances 
that are limiting its implementation. The research questions addressed were why Dfab is essential in the 
construction sector, the current state-of-the-art of Dfab in the construction industry, and how Dfab is improving 
the construction industry. Through a systematic literature review, the findings proposed that Dfab can revolutionize 
the construction sector, enabling freeform architecture, reducing construction costs, cutting material waste, and 
increasing worker safety. Nevertheless, further research is needed to overcome obstacles such as high costs and 
the lack of digital skills in the construction industry. 


KEYWORDS: Digital fabrication, Construction industry, Project management, Digital technology, Systematic 
review. 


1. INTRODUCTION 


The building industry is recognized as a large consumer of natural resources and a significant contributor to 
environmental impacts and is considered an inefficient manufacturing process (Wu, Wang and Wang, 2016). It is 
still working to improve the situation and boost overall productivity, but there are obstacles to overcome (Garcia 
de Soto et al., 2018). To address environmental challenges, there is a need to rethink conventional building models 
and techniques due to the predicted increase in global population in the coming decades (Naboni, Breseghello and 
Kaunic, 2019). To promote sustainability, the architectural profession needs to develop fully automated production 
forms and processes (Tuvayanond and Prasittisopin, 2023). 


The ability to create objects directly from design information is causing a transformation in many fields of design 
and production (Agusti-Juan and Habert, 2017). The key to fostering high-quality industry growth is creating and 
applying digital transformation (Yuan et al., 2022). The use of digital fabrication (Dfab) technologies is rapidly 
increasing in the Architecture, Engineering and Construction (AEC) industry (Graser, Kahlert and Hall, 2021). 
The "third industrial revolution," also known as digital fabrication, is anticipated to revolutionize the construction 
sector by allowing freeform architecture, lowering construction costs, reducing material waste, and raising worker 
safety (Wangler et al., 2016). Dfab refers to a construction process that utilizes digital code to control 
manufacturing devices and processes, allowing for the seamless conversion of design and engineering data into 
physical products (Graser, Kahlert and Hall, 2021). Dfab is an automated fabrication method that uses data to 
enhance efficiency and productivity (Ng and Hall, 2021).The technology began developing more than 25 years 
ago, but its rapid development started later (Žujović et al., 2022). The use of Digital design and fabrication 
technologies have created methods and processes for producing more complex and customised architectural 
solutions while still utilising standard building materials over the last two decades (Carvalho and Sousa, 2014). 
Integrating design and construction is essential for new technologies such as digital fabrication, and a specialised 
design management strategy is required to overcome integration barriers (Ng, Graser and Hall, 2023). Digital 
technology allows for better control, increased construction efficiency, the removal of the need for conventional 
formwork, and the ability to customise building materials during the construction process compared to traditional 
methods (Yuan et al., 2022). The use of computational design and robotic fabrication together has the potential to 
bring about significant advancements in the form and structure of architecture (Agusti-Juan and Habert, 2017). 


Digital fabrication necessitates a redesign of the design process. Thus, there is a need for a better understanding of 
digital systems in areas such as technical development, technological systems, organisational contexts, contractual 
provisions, and business models (Ng et al., 2022). However, the use of additive Dfab in large-scale construction 
is still in its early stages and requires overcoming challenges in changing traditional construction processes and 
roles of those involved in the project (Garcia de Soto et al., 2018). 
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A BIM platform is not essential for Dfab design in small-scale projects, as long as there is integration of process, 
information, and organisation (Ng, Graser and Hall, 2023). BIM is a cutting-edge digital system that promotes 
innovation and enhances project values through information integration in construction projects, which also 
involves changes in design management and overall best practices (Ng, Graser and Hall, 2023). Different non- 
destructive methodologies to capture complex shapes have been developed through the use of photographs, video- 
recording, laser sensors and LED light projections, demonstrating the significant advantages in speed and accuracy 
that these digital methods can offer compared to conventional analogue processes (Lorenzo and Mimendi, 2020). 
Many countries have plans to increase the proportion of construction activities carried out off-site (Kim, Cuong 
and Shim, 2022). However, the effectiveness of DFAB is determined by the inclusion of fabrication information 
and organization in the design process, which can be challenging to achieve in traditional delivery models such as 
Design-Bid-Build (Ng and Hall, 2021). 


Project management and delivery models have shifted fundamentally due to the digitization of project information 
(Hall, Whyte and Lessing, 2020). The uniqueness of each construction work is due to its immobility, customization 
of both construction works and processes, and interdisciplinarity (Bischof, Mata-Falcon and Kaufmann, 2022). 
DFAB techniques combine automated subtractive, formative, or additive building methods with computational 
design approaches (Garcia de Soto et al., 2018). An alternative to costly and inefficient manufacturing practices 
was proposed through automation in construction and architecture (Tuvayanond and Prasittisopin, 2023). The 
limits of architectural design and production may be expanded by digital fabrication techniques (Agusti-Juan and 
Habert, 2017). Dfab adoption faces not only technical challenges, but also organizational and process barriers, as 
it involves multiple research disciplines and professions such as architects, materials scientists, roboticists, 
structural engineers, manufacturers, and trade contractors (Graser, Kahlert and Hall, 2021). However, there is a 
desire to investigate alternative methods of creating formworks using digital fabrication technology (Carvalho and 
Sousa, 2014). 


In recent years, the intersection between digital fabrication techniques and cementitious materials has become 
significant (Wangler et al., 2016). Digital fabrication with concrete is a newly developed and wide-ranging field 
that can potentially reduce environmental impact and promote industrialization in construction while meeting 
various construction requirements (Bischof, Mata-Falcon, & Kaufmann, 2022). Modern product creation has 
shifted to rely heavily on 3D printing (Agusti-Juan & Habert, 2017). Digital fabrication has been applied to the 
production of formworks using concrete, a significant application of this technology (Wangler et al., 2016). 
However, in free-form, digital fabrication using concrete, accurately predicting the material's mechanical 
properties in its fresh state is crucial to ensure control over element deformations and overall stability during the 
printing process (Esposito et al., 2021). Bucklin et al. (2023) describe a new construction method called the Mono- 
Material Wood Wall (MMWW), which employs subtractive manufacturing with digital control to enhance the 
functionality of wood and eliminate the need for other materials, thereby improving sustainability compared to 
traditional construction techniques. The impact of the fast-growing demand and regeneration rate of renewable 
building materials on the environment in the long term is yet to be determined despite the industry's shift towards 
them (Bitting et al., 2022). However, using non-traditional renewable materials and developing suitable design 
and construction processes will be necessary for large-scale construction (Lorenzo & Mimendi, 2020). 


According to Graser, Kahlert, & Hall (2021), it is crucial to reduce the time it takes to introduce new Dfab 
technologies to the market to speed up adoption, but this has been challenging. To successfully implement digital 
fabrication in the construction industry, better integration of fabrication-related information and organization into 
the design process is needed despite its growing emergence (Ng & Hall, 2021). Correspondingly, an increasing 
number of studies investigate the industry needs and strategies for adopting digital fabrication (Ng et al., 2022). It 
is essential to research the environmental advantages of digital fabrication in architecture and construction, as it is 
still a developing technology, to make necessary adjustments in the early stages (Agusti-Juan and Habert, 2017). 
Despite extensive literature outlining its challenges, limited attention has been given to strategies employed in 
projects to successfully implement Dfab. The construction industry is currently focusing its research efforts on 
robotic fabrication, collaborative work between humans and robots, and prefabricated technologies as part of smart 
construction (Yuan et al., 2022). 


Given this, the current study determines the impact of digital fabrication on the construction industry. The study 
seeks to answer the following research questions: 


1. Why is digital fabrication important in construction industry? 


2. What is the state-of-the-art of digital fabrication related to construction industry? 
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SECTION B - ADVANCED PROJECT MANAGEMENT AND CONTROL 


3. How is digital fabrication improving the construction industry? 


2. RESEARCH DESIGN AND METHODOLOGY 


The analysis performed in this study identified the main research themes and categorized them based on the 
impacts of digital fabrication (Dfab) in the construction industry. The results may benefit other researchers as they 
summarise recent advancements, patterns, and potential research and innovation opportunities in the AEC sector. 
Based on the selected research philosophy, this study adopts a qualitative research strategy with an inductive 
approach. In qualitative research papers, the methods section emphasizes transparency of the methods used, such 
as the reasons, processes, and individuals involved in their implementation, to provide a deeper understanding and 
facilitate discussion of how they may have affected material’s mechanical properties (Busetto, Wick and 
Gumbinger, 2020). 


This study used the systematic literature review (SLR) method, that minimizes bias by exhaustively searching 
relevant studies through a systematic, transparent, and reproducible process (Chung, Lee and Kim, 2021). This 
study utilized the keyword search method and the snowball method to gather relevant information. In order to 
gather more information and discover papers that may have been overlooked, the keyword search and the 
snowballing technique were combined. To initiate the development of this study, the primary task was to identify 
the relevant keyword for retrieving research articles from academic databases. The following summary provides 
an outline of the process involved in this stage. 


Researchers utilized the Scopus database, benefiting from its advanced search features, including filters for authors, 
affiliations, publication years, and document types, facilitating the discovery of pertinent and up-to-date research 
in specific fields. In the initial search for relevant literature, researchers employed the keyword string "[digital 
AND fabrication AND construction AND industry]" and obtained 300 documents. Subsequently, they applied 
specific restrictions, including open access availability, subject area in engineering, English language, and exact 
keywords "Digital Fabrication" or "Construction Industry," resulting in the retrieval of 47 documents directly 
related to their research topic. 
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Earth and Plane... (24% 
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Figure 1: The distribution of documents by subject area before applying any restrictions to the search results in 
Scopus 
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Figure 2: The distribution of documents by year before applying any restrictions to the search results in Scopus 
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The main objective of this section was to evaluate the appropriateness of the selected resources rather than the 
procedure itself. To determine eligibility, the study utilized an inclusion and exclusion approach, ensuring that only 
publications directly related to emerging digital fabrication in the construction industry were included. A benefit 
of these resources is that they contain highly relevant information pertaining to the study's topic. The researchers 
based their document selection on criteria that focused on the relevance of content to "Data Analysis and 
Management" in the most recent publications within the "Construction" field, specifically related to the use of 
"Digital Fabrication" in construction. Through an examination of titles, abstracts, and full texts, irrelevant 
documents were excluded, resulting in the selection of 27 articles that met these criteria. 


The chosen resources were subject to content analysis, with most being journal articles providing comprehensive 
insights into digital fabrication in construction. While some sources were not directly construction-related, they 
still contributed to understanding the emerging digital technology. The selected studies utilized diverse qualitative 
or quantitative research methods, ensuring varied findings that enhance the credibility of this study concerning the 
research questions. Finally, the last step involved identifying the primary themes associated with the research 
questions. 


3. DATA ANALYSIS AND MANAGEMENT 


These documents' titles, abstracts, and full texts were examined to remove any irrelevant ones, resulting in 27 
newly selected articles that are relevant to the study as shown in table1. 


Tablel: 27 newly selected articles that are relevant to the study 


Tittle Year Country Source 

1 Designing for Digital Fabrication: An Empirical Study of | 2021 Switzerland Journal of Management in Engineering 
Industry Needs, Perceived Benefits, and Strategies for 
Adoption 

2 DFAB HOUSE: implications of a  building-scale | 2021 Switzerland Construction Management and 
demonstrator for adoption of digital fabrication in AEC Economics 

3 Digital fabrication, BIM and early contractor | 2023 Switzerland Architectural Engineering and Design 
involvement in design in construction projects: a Management 


comparative case study 


4 Environmental assessment of multi-functional building | 2019 The International Journal of Life Cycle 
elements constructed with digital fabrication techniques Assessment 
Switzerland 
5 Feasibility study of large-scale mass customization 3D | 2022 China Frontiers of Architectural Research 


printing framework system with a case study on Nanjing 


Happy Valley East Gate 
6 Mirror-breaking strategies to enable digital | 2020 Switzerland CONSTRUCTION MANAGEMENT 
manufacturing in Silicon Valley construction firms: a AND ECONOMICS 


comparative case study 


7 Multi-scale design and fabrication of the Trabeculae | 2019 Denmark Additive Manufacturing 
Pavilion 

8 Productivity of digital fabrication in construction: Cost | 2018 United Arab | Automation in Construction 
and time analysis of a robotically built wall Emirates 
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9 Teaching Target Value Design for Digital Fabrication in | 2021 Switzerland 29th Annual Conference of the 
an Online Game: Overview and Case Study International Group for Lean 
Construction 
10 | Design for Manufacture and Assembly of Digital | 2023 Thailand Buildings 
Fabrication and Additive Manufacturing in Construction: 
A Review 
11 3D Printing Technologies in Architectural Design and | 2022 Serbia Buildings 
Construction: A Systematic Literature Review 
12 Challenges and Opportunities in Scaling up Architectural | 2022 Switzerland Biomimetics 
Applications of Mycelium-Based Materials with Digital 
Fabrication 
13 A critical review of the use of 3-D printing in the | 2016 Australia Automation in Construction 
construction industry 
14 | Digital Concrete: Opportunities and Challenges 2016 Switzerland RILEM Technical Letters 
15 Digital Fabrication Technology in Concrete Architecture | 2014 Portugal 32nd International Conference on 
Education and research in Computer 
Aided Architectural Design in Europe 
16 | Digital Fabrication for DfMA of a Prefabricated Bridge | 2022 South Korea The 17th East Asia-Pacific Conference 
Pier on Structural Engineering & 
Construction 
17 | Digitisation of bamboo culms for structural applications | 2020 United Kingdom | Journal of Building Engineering 
18 Early-age creep behaviour of 3D printable mortars: | 2021 Italy Materials and Structures 
Experimental characterisation and analytical modelling 
19 | Environmental design guidelines for digital fabrication 2017 Switzerland Journal of Cleaner Production 
20 | Environmental Impact of a Mono-Material Timber | 2017 Germany Sustainability 
Building Envelope with Enhanced Energy Performance 
21 Fostering innovative and sustainable mass-market | 2022 Switzerland Cement and Concrete Research 
construction using digital fabrication with concrete 
22 | Framework for technical specifications of 3D concrete | 2021 South Korea Automation in Construction 
printers 
23 Identifying enablers and relational ontology networks in | 2022 Switzerland Automation in Construction 
design for digital fabrication 
24 | Rethinking reinforcement for digital fabrication with | 2018 Italy Cement and Concrete Research 
concrete 
25 Toward Lean Management for Digital Fabrication: a | 2019 Switzerland 27th Annual Conference of the 
Review of the Shared Practices of Lean, Df{MA and dfab International Group for Lean 
Construction (IGLC) 
26 | Towards Automated Installation of Reinforcement Using | 2019 Sweden 2019 24th IEEE International 
Industrial Robots Conference on Emerging Technologies 
and Factory Automation (ETFA) 
27 | Using Computer Vision for Monitoring the Quality 2022 India Sustainability 


of 3D-Printed Concrete Structures 
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CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


Figure 3 displays the distribution of chosen documents based on their year of publication, while Figure 4 illustrates 
the distribution of chosen documents based on the country where the research was conducted. 
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Figure 3: The distribution of chosen documents based on their year of publication 


Figure 4: The distribution of chosen documents based on the country where the research was conducted 


4. RESULT AND DISCUSSIONS 

4.1 Qualitative Analysis and Discussion 

The study aim to present an overview of the impacts of digital fabrication in the construction industry. 
4.1.1 Importance of digital fabrication in the construction industry 


Due to industry fragmentation, the AEC sector adopts new technologies more slowly than other sectors, but the 
emergence of Digital Fabrication (DFAB) offers a systematic innovation that can help with this problem (Ng, 
Graser and Hall, 2023). Recent studies have focused on the impact of new digital technologies like Building 
Information Modelling (BIM) on design management (Ng, Graser and Hall, 2023). Although a complete 
consolidation that outlines the factors contributing to the design process for digital fabrication is currently 
unavailable (Ng et al., 2022). However, research on Dfab is still in its early stages. It lacks well-developed 
mechanisms allowing full-scale project adoption in the sector (Ng and Hall, 2019). 


According to Ng et al. (2022), igital Fabrication is becoming increasingly common due to its potential to improve 
project efficiency by connecting design and construction processes, and it can be categorized into five groups: 
technological systems, organizational framework, contractual terms, and business models. Two possible 
approaches for dfab management are provided by lean construction management and design for manufacture and 
assembly (DfMA) (Ng and Hall, 2019). Adopting DFAB has many advantages, such as increased productivity and 
resource efficiency, reduced waste in the building industry, and increased worker safety (Graser, Kahlert and Hall, 
2021). 
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Projects offer a distinct opportunity to investigate and add to the emerging understanding of Dfab in AEC due to 
their capacity to integrate complex knowledge (Graser, Kahlert and Hall, 2021). The adoption of Dfab in AEC 
faces significant challenges due to the industry's fragmentation, weak coordination between contractors, and high 
participant turnover between project phases, making the organizational and social context as important for industry 
adoption as technological feasibility (Graser, Kahlert and Hall, 2021). To minimize environmental impacts, 
structural complexity should result from material reduction strategies such as structural optimization or multi- 
functionality (Agusti-Juan, Jipa and Habert, 2019). 


According to Agusti-Juan, Jipa and Habert (2019), Digital fabrication techniques that achieve multi-functionality 
lead to a construction process that is efficient in its use of materials and has significant environmental benefits 
during production. The on-site mass production of complex, customised structures is made possible by digital 
fabrication in building (Agusti-Juan and Habert, 2017). With the projected increase in global population, it is 
necessary to rethink traditional building methods and establish new techniques to reduce the environmental impact 
of the construction sector. Digital fabrication can aid in this effort by reducing material usage and overall 
environmental impact (Naboni, Breseghello and Kunic, 2019). 


However, to make a positive change in the built environment, this mode of digital architecture is expected to work 
towards fully automated production forms and processes that promote equality, sustainability, democracy, diversity, 
and inclusiveness (Žujović et al., 2022). Collaboration between structural engineers, roboticists, builders, and 
material scientists will be crucial for digitally fabricating concrete (Wangler et al., 2016). 


4.1.2 The state-of-the-art of Digital Fabrication related to the construction industry 


Academic and industrial applications have explored various additive technologies in different scales and contexts, 
from thermoplastics to clay, gantry 3D printers to robotic arms and drones (Naboni, Breseghello and Kunic, 2019). 
Many researchers are looking into robotic 3D printing, a new digital fabrication technique, to address the problem 
of traditional building methods' declining productivity (Yuan et al., 2022). 


On-site digital fabrication, which aims to bring additive fabrication processes to construction sites, is divided into 
three main categories: large-scale robotic structures, mobile robotic arms, and flying robotic vehicles (Garcia de 
Soto et al., 2018). Scholarship explores the use of digital systems, such as BIM platforms that can help stakeholders 
coordinate the management data, including 3D models and algorithms that link to digital fabrication (Ng et al., 
2022). 


The data from the researched case study by Graser, Kahlert and Hall (2021) indicates that implementing DFAB 
projects can increase its acceptance as a legitimate practice in AEC. However, for DFAB adoption to be successful, 
it needs to be accepted not just within the project organization but also outside it. Large-scale AM machines are 
being used to construct recent architectural projects globally, which has sparked a growing interest in implementing 
and expanding the technology within the construction industry and architecture (Tuvayanond and Prasittisopin, 
2023). A study by Bitting et al. (2022) provides an overview of the current state of research and applications of 
mycelium-based materials, emphasizing digital fabrication, production, and design and discussing issues such as 
low mechanical properties and the absence of standardized production methods. The use of digital design 
information to drive production processes, such as 3D extrusion printing, CNC machines, and robotic assembly, 
is known as digital fabrication, and it is an essential component of modern construction processes (Ng et al., 2022). 


Wu, Wang and Wang (2016) explored the significance of component design about 3D printing capabilities and raw 
material performance and the potential benefits of using BIM to support design variations and improve 
performance, while also reducing the time and costs associated with design changes and reprinting. 


Despite the potential advantages of automation, there have been few cases of robots being used to automate 
construction in recent years (Relefors et al., 2019). The prefabricated bridge construction process has used Df{MA, 
a design method commonly used in manufacturing manufacturing (Kim, Cuong and Shim, 2022). Digital 
fabrication techniques can be categorized into subtractive methods such as milling and cutting, and additive 
methods such as 3D printing, which has become increasingly popular and accessible for home use (Agusti-Juan 
and Habert, 2017). 


Bischof, Mata-Falcon and Kaufmann (2022) assert that widespread adoption of digital fabrication in the 
construction industry is critical to making a meaningful impact on improving its environmental impact, but 
currently, it has not yet reached the mass market. Chung, Lee and Kim (2021) point out that despite the rapid 
expansion of research and market for 3D concrete printing (3DCP), there is a lack of a widely accepted technical 
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specification framework for comparing 3DCPs with various characteristics. Despite automation initiatives in both 
research and industry, such as Built Robotics and MX3D, the construction industry has not yet demonstrated a 
shift towards automation (Relefors et al., 2019). 


4.1.3 How Digital Fabrication improves the construction industry 


Digital fabrication has the potential to bring about extensive positive impacts, such as improved material efficiency 
and waste avoidance, reuse of materials, workplace health and safety, integrative work design, and productivity 
(Graser, Kahlert and Hall, 2021). The integration of digital and manual tasks was crucial for the project, and there 
was a need for better collaboration processes with digital machinery (Graser, Kahlert and Hall, 2021). The 
productivity rate for robotic construction is constant, which means it doesn't depend on the complexity level of the 
construction (Garcia de Soto et al., 2018). Concrete 3D printing reduces waste production by 60%, construction 
time by 50-70%, and labour costs by 50-80%, potentially decreasing construction costs by up to 35% while 
improving the industry's sustainability (Senthilnathan and Raphael, 2022). 


To promote sustainable development opportunities through the use of digital systems, design modeling with 
parametric modeling capacity can be utilized to minimize rework and waste by testing the feasibility and soundness 
of integrated digital twin models through physical mockups prior to tendering (Ng et al., 2022). Despite being 
promoted in various countries, there is a lack of consistency and diversity in stakeholder perspectives and research 
advancements regarding the implementation of digital fabrication, with interdependencies between industry needs 
creating complexities for stakeholders to adopt such projects, further hindering their adoption on a larger scale, 
highlighting the need for a better understanding of industry practitioners' needs and how they are related to one 
another (Ng et al., 2022). 


According to the research by Ng and Hall (2021), Target Value Design (TVD) implementation can help manage 
and optimize DFAB processes to meet time, cost, profit, and aesthetic requirements in less time while maintaining 
the needs of stakeholders. The conventional Design-Bid-Build model's separate processes can impede the 
implementation of digital fabrication techniques by making it difficult for stakeholders to manage project costs 
(Ng and Hall, 2021). Digital fabrication is anticipated to result in a more sustainable construction industry by 
enabling more efficient structural design that uses materials only where necessary and by reducing waste 
generation through more efficient construction techniques, particularly about formwork (Wangler et al., 2016). 


5. RECOMMENDATIONS AND DIRECTIONS FOR FUTURE RESEARCH 


The impacts of digital fabrication in the construction industry are still an area that requires further research. Several 
recommendations and directions for future research can be made based on the reviewed literature. One area that 
requires investigation is the potential economic benefits of digital fabrication in construction projects. Future 
studies could conduct a cost-benefit analysis to provide a clearer understanding of the potential economic benefits 
that could be achieved by implementing digital fabrication in the construction industry. Another area that requires 
exploration is the potential environmental benefits of digital fabrication in the construction industry. Future studies 
could focus on the potential environmental benefits that could be achieved through digital fabrication in the 
construction industry, such as reducing waste and carbon emissions. In addition, future research could investigate 
the best strategies for implementing digital fabrication in the construction industry. This could include examining 
the barriers to adoption, identifying practical training and education programs, and exploring the potential role of 
government policies and incentives. The reviewed studies provide valuable insights into the impacts of digital 
fabrication in the construction industry. They are applicable in various areas within the field, including but not 
limited to construction management, architecture, and engineering. For example, the studies provide insights into 
the potential benefits of digital fabrication in terms of cost, time, and quality management in construction projects. 
They also provide insights into the potential for digital fabrication to revolutionize the design and construction of 
buildings and other structures, as well as improve the efficiency and effectiveness of engineering processes in the 
construction industry. 


One potential research question that could be addressed in future studies is: What are the best strategies for 
overcoming the barriers to adoption of digital fabrication in the construction industry? This question would be 
designed to address the identified need for research on implementation strategies and could help to provide insights 
into how digital fabrication can be successfully integrated into the construction industry. 
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6. CONCLUSION 


The research on the impacts of digital fabrication in the construction industry highlights the potential benefits and 
challenges associated with adopting this technology. Through a systematic literature review, the study explores the 
current state of digital fabrication (Dfab) in construction, its significance, and its potential to improve the sector. 


The study's extensive research has provided valuable insights into how digital fabrication could bring about a 
revolution in the construction sector. Firstly, Dfab enables the creation of intricate and customized structures that 
were previously impossible using conventional methods, thus offering new possibilities for innovative and 
sustainable designs that can shape the industry's future. Secondly, Dfab has the capacity to substantially lower 
construction costs and minimize material waste, thereby boosting efficiency and contributing to resource 
conservation, a crucial factor for environmental sustainability. Additionally, the adoption of Dfab could enhance 
worker safety by automating hazardous tasks and reducing the necessity for manual labor in risky conditions. 
Despite the intriguing benefits, the study has brought to light the difficulties that prevent Dfab from being widely 
used in the building industry. To effectively utilise the promise of digital fabrication, significant barriers such as 
high starting prices and a lack of digital expertise in the market must be overcome. Also, there are organisational 
and operational challenges when incorporating Dfab into conventional building processes and delivery models, 
which emphasises the necessity of communication and cooperation across many disciplines. 


The qualitative analysis conducted in this study highlights the importance of seamless integration and collaboration 
among various stakeholders, such as architects, engineers, roboticists, and material scientists, for the successful 
deployment of Dfab technologies in the construction industry. Furthermore, the adoption of digital fabrication calls 
for a comprehensive redesign of the design process, considering technical development, organizational contexts, 
contractual provisions, and business models. 


This study was limited to academic journals, articles, and conference proceedings found in the listed scientific 
sources. Following an inductive methodology that only used secondary data, the qualitative analysis and discussion 
were conducted. Primary data, however, might have provided a more in-depth and analytical grasp of the subject. 


The study recommends further research to investigate the economic and environmental benefits of implementing 
Dfab in construction projects. Additionally, it emphasizes the need to identify effective strategies for overcoming 
barriers to adoption to ensure successful integration. 
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ABSTRACT: Video-based fire detection is a crucial object detection problem that relies on accurate and reliable 
data to detect fires. However, collecting and labeling fire-related data can be time-consuming and expensive, 
making it difficult to obtain sufficient data for training machine learning models. To address this challenge, 
uncertainty-based active learning techniques can be used to iteratively select the most informative samples for 
labeling. This can reduce the amount of labeled data needed to achieve high model performance and has the 
potential to even prune the training data with fewer informative samples. The traditional sampling-based 
uncertainty estimation methods are computationally expensive. Hence, an efficient prior network-based ensemble 
distillation State-of-the-Art approach is evaluated on an internal dataset which still requires relatively higher 
overhead computation making it difficult for production deployment. A biased softmax differencing-based 
uncertainty approach and a feature-based hard data mining approach are proposed and compared with the 
distillation approach. The novel approaches are found to have a very low overhead uncertainty estimation time 
compared to the ensemble distillation approach and traditional sampling techniques. The methods are evaluated 
in the context of curating the unlabeled pool data and improving the training data. For completeness, the 
experiments are performed on three different data sizes, and overall, the frame-wise selection strategy is proved 
to be better than the sequence-wise querying strategy. The Principal Component Analysis (PCA)-based hard data 
mining outperformed other methods and improved the model performance by 16.33% with AUC2% metric when 
compared with the random selection of data. The approach even outperformed the main network trained on full 
data by 7.33%, henceforth improving the training data by using informative 26.39% data. The results indicate that 
novel data mining provides efficient training and pool data curation. 


KEYWORDS: Uncertainty Estimation, Active Learning, Object Detection, Outlier Detection, Feature-based 
cluster analysis, Video-based Fire Detection 


1. INTRODUCTION 


Traditional smoke detectors require a volume of smoke to reach the detector location and hence generally have a 
high detection time. Thermal cameras (Sousa, et al. 2020) on the contrary are quite expensive and work in the 
infrared spectrum resulting in fire detection only when there is significant heat produced. There is no visual 
confirmation of the fire when using thermal cameras. Hence, deep learning-based video detection can be the 
solution to decrease the detection time and detect fires based on patterns in the video rather than heat produced. 
However, for an industrial Deep Learning (DL) application of Fire Detection (FD), the speed and the reliability of 
the predictions are of major concern. The non-reliability can lead to high economic losses and even human 
endangerment. Late detection can cause heavy economic and even human losses. Speaking about statistics in the 
industrial setting in the USA, 1.2 Billion $ in economic loss along with 16 deaths and 273 injuries occur annually 
(Campbell, 2018) indicating the importance of reliable video-based fire detection. However, in order to have a 
reliable DL model for detecting fire, the data selected for the model training should be of high quality. 


The research takes inspiration from Peter Norvig’s quote: “More data beats clever algorithms, but better data 
beats more data”. The traditional thinking of improving a DL model by increasing the quantity of the data in object 
detection does not answer why the new data is being added to the training data. Nevertheless, even if large amounts 
of data is readily available, in object detection problem, the requirement of labeled data raises the cost of annotation 
massively. In order to improve the performance of the model and simultaneously decrease the cost of labeling, the 
most informative data have to be sampled. 
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The aim of decreasing the annotation costs can be achieved by answering two questions: which data has to be 
selected and why? The passive learning models which receive the data randomly or by humans do not consider the 
informativeness associated with the data. However, the information is associated with uncertainty related to the 
data. With the help of uncertain information linked with the data, using Active Learning (AL), the informative data 
can be sampled iteratively. Sampling informative data using AL requires the estimation of uncertainty. The 
uncertainty can be referred to as a negative reliability score while a higher uncertainty score means low reliability. 
The uncertainty can help increase the reliability of the DL models by sampling informative uncertain data from 
the large dataset, thereby using AL to decrease the cost of the annotation. The concept of AL can assist in the data 
selection process or autonomously select data, which can subsequently be reviewed and labeled by a human 
annotator for subsequent training sessions. 


The major contributions of our work can be summarized as follows: 


e We propose two different methods for estimating uncertainty. One of the methods focuses on estimating 
uncertainty using feature space and the other approach uses softmax differences for uncertainty estimation. 

e We compare different methods based on the time required for uncertainty estimation and the performance 
when implemented in an iterative AL setting. 

e We experimentally show that our novel approach is efficient w.r.t time for uncertainty estimation. The 
approach even outperforms the State-of-the-Art (SOTA) approach on the AUC metric in an active learning 
setting. 

e We evaluate the performance of different methods w.r.t improving the full training data set by decreasing the 
data using AL. 


2. RELATED WORK 


Uncertainty estimation can be used as an additional component for the DL model to increase the trust and 
robustness of the SOTA architectures. DL models are often black box models, with often limited or no 
interpretability of the results. With the predictions of the DL model, uncertainty estimates can be incorporated to 
increase the reliability of the results from the black box models. Gal and Ghahramani (Gal & Ghahramani, 2016) 
introduced a Monte Carlo Dropout-based (MCD) uncertainty quantification method that has a very low overhead 
computation cost. The drop block-based uncertainty estimation (Deepshikha, et al. 2021) was proposed using 
Monte-Carlo DropBlock (MCDB). Deep ensembles (DeepEns) (Lakshminarayanan, et al. 2017) was used to 
estimate the predictive uncertainty using model re-training. The Test-Time-data Augmentation (TTA) 
(Manivannan, 2020) was compared with MCD, MCDB, and DeepEns in the context of uncertainty estimation. The 
research in the domain of uncertainty quantification is usually done in the fields of model and data uncertainty 
separately. However, one of the first methods combining the effects of both epistemic and aleatoric uncertainty 
was proposed in (Kendall & Gal, 2017). The loss function to estimate both uncertainties was suggested for the 
depth regression and segmentation tasks. Malinin and Gales proposed to estimate the predictive uncertainty of the 
deep learning model using Prior Networks (Malinin & Gales, 2018) which incorporated explicit modeling of 
distributional uncertainty. This approach parameterized the Dirichlet network over the categorical distribution to 
maintain the distribution extraction capability from the student model after the knowledge distillation (Malinin, et 
al. 2019). The Bosch internal research method of a low overhead FACER (Schorn & Gauerhof, 2020) based prior 
network (FacerDir) uncertainty quantification method will be used in the thesis as one of the SOTA methods for 
benchmarking. However, due to the high uncertainty estimation computation time, we propose novel approaches 
with both lower estimation time and more reliable uncertainty estimates. 


With the need for better-performing object detection methods, the requirement of the data for training the SOTA 
architectures has also increased. In order to prioritize the labeling of the data and data curation, a lot of research 
has been done in the field of Active Learning. A comparison of various acquisition functions for computing 
uncertainty was done in the setup of AL (Nguyen, et al. 2022). Choi et al. (Choi, et al. 2021) proposed a 
probabilistic approach to estimate both the model and data uncertainty in a single pass and later performed Active 
Learning. The scalability and transferability were tested over the probabilistic Active Learning approach. An 
adaptive framework for active learning was developed and proposed in (Desai, et al. 2019) which performed 
adaptive switching between strong and weak supervision. 


The present research in the fields of uncertainty estimation and AL will serve as the inspiration for the proposed 
paper, which aims to incorporate various techniques into the internal architecture of the Video-based Smoke 
Detection (VSD) model. 
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3. METHODOLOGY 


An overview of the AL framework using different uncertainty estimation strategies using a business chart is 
depicted in Fig. 1. The chart illustrates the brief methodology of the approaches in an AL setting. 
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Fig. 1 Thesis methodology which describes the active learning framework implementation using various 
uncertainty estimation techniques. 


3.1 Dataset 


The dataset used in the thesis was developed by the engineering team at Bosch. As this dataset is a Bosch internal 
dataset, it is not available to the public. The dataset comprises Smoke and Non-smoke, “Negative” videos. 


The dataset includes video sequences shot over 100 different locations. The locations can be classified into three 
major scenarios, viz. indoor, outdoor, and semi-outdoor. The scenarios which are shot in an indoor setting ranging 
from a parking lot to an industry are considered Indoor scenario. The outdoor scenario is a scenario shot in an open 
environment. The semi-outdoor scenario is on the contrary a setting in which there is a ceiling but lacks enclosure 
from all sides, resulting in a partially open space. Wind may be present in the outdoor setting and this changes the 
behavior of the smoke significantly. 


3.2 Uncertainty Estimation 
3.2.1 PCA-based Hard Data Mining (PhDm) 


This approach is a novel method that is based on Principal Component Analysis (PCA) (Jolliffe & Cadima, 2016) 
for detecting hard samples from the pool data. PCA is a dimensionality reduction approach used to project the data 
to a low-dimension space and provide interpretability to the data. The PCA is applied to the feature vector space 
and reduced to the dimension of two. For every image sample, only the feature vector of the maximum prediction 
is used for dimensionality reduction as shown in Fig. 2. 


The outliers found in the PCA plot were investigated and they were found to be often either hard or out-of- 
distribution examples. Hence, a density-based outlier detection method has been implemented to extract outliers 
from the PCA plot. Local Outlier Factor (LOF) (Breunig, et al. 2000) was used for detecting outliers in the plot 
which looks at the local density for each point. To query top outlier samples from the plot, a distance-based ranking 
is performed which ranks each outlier point based on the distance from the nearest inlier point. 
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Fig. 2 Feature vector dimensionality reduction. The features (middle) are extracted through the model for the 
input images (left) and later PCA-based dimensionality reduction is performed. The bold patch represents the 
patch with maximum softmax prediction whose feature vector is used for dimensionality reduction. 


3.2.2 Biased Maximum Softmax Difference (BiasedMSD) 
Typically, the final layer of a deep learning classification model is the softmax layer, which produces predictions 


for a given input. Ideally, the model should accurately replicate the ground truth of the input sample. However, in 
practice, an entirely perfect or ideal model is unattainable, and a prediction close to the expected ground truth is 


typically observed. 
Bounding box E | 
information 


fis | Softmax Difference 


Fig. 3 Illustration of BiasedMSD approach. The difference between the model prediction (mid-bottom) and the 
ground truths (mid-top) is performed and termed as softmax difference (Bold patch is the patch with maximum 
prediction and it’s softmax difference is used for the whole frame). 


Model prediction 


The Biased Maximum Softmax Difference is another novel approach that is quite simple and easy to implement. 
The definition of uncertainty that is adopted in this approach is the distance between the ground truth and reality. 
For the patch-wise classification problem, as the prediction obtained is patch-wise, the softmax difference is also 
obtained patch-wise for individual frames. 


The drawback of the approach is that one needs the labels beforehand to estimate uncertainty. Due to this 
shortcoming and the requirements of the labels beforehand, the method is called Biased Maximum Softmax 
Difference (BiasedMSD), while the bias of requiring ground truth is present for estimating the uncertainty. The 
visual representation of the working of the BiasedMSD is illustrated in Fig. 3. The inherent bias to require labels 
for uncertainty estimation suggests that the method may only be used for curating labeled data. 


3.3 Active Learning 


The final step of the methodology in the scenario of Active Learning (AL) encompasses the implementation of all 
previously described efficient methods. AL initiates with random sequences in the training dataset, while the 
remaining data is stored in the pool set. The central concept of AL involves adding a budget (k) of samples or 
videos to the training set iteratively and removing samples from the pool set. As a result, the iterative process leads 
to an increase in the size of the training set and a decrease in the size of the pool set. The training of the model is 
performed on a set of training data, and the methods for estimating uncertainty are utilized to compute scores of 
uncertainty. The top-k selection for video-based data can be performed using two distinct approaches. 


619 


23. PROCEEDINGS 


3.3.1 Sequence-wise selection 


The sequence-wise (SW) selection method involves selecting the whole video or sequence. Since uncertainty 
estimation is performed on individual frames, a sampling strategy is employed to aggregate the frame-wise 
uncertainty estimates into sequence-wise estimates. These sequence-wise uncertainty estimates are then ranked, 
and the top k sequences are selected. However, this approach may be influenced by a limited number of uncertain 
frame samples within a video, leading to a bias towards those samples. 


3.3.2 Frame-wise selection 


The frame-wise selection (FW) method directly ranks the frame-wise uncertainty estimates, and the top-k frame 
samples are selected. This approach eliminates the need for a sampling strategy, as the frame-wise uncertainty 
estimates are directly considered. 


After the top-k selection, the samples or videos selected from the pool set are added to the training set and removed 
from the pool set as illustrated in Fig. 1. This whole process is repeated in an iterative manner. 


4. EXPERIMENTATION 


We conduct different experiments to evaluate and compare the performance of various uncertainty estimation 
methods in the context of Active Learning. Inception-vl (Szegedy, et al. 2014) was used as a model backbone 
architecture and the input video was resized into the shape of (640, 360) grayscale images. We use Adam optimizer 
while training the model. 


4.1 Evaluation Metric 


In general, object detection algorithms are evaluated using mean Average Precision (mAP) which requires the 
computation of Intersection over Union (IoU) over the bounding boxes. But the model implemented in the research 
does not provide bounding box information as an output of the model, but rather a patch-wise classification. Hence, 
the Receiver Operating Characteristic (ROC) (Streiner & Cairney, 2007) curve has been adopted for performing 
the model evaluation. ROC curve is computed by evaluating the predictions of the model over different thresholds, 
and plotting True Positive Rate (TPR) and False Positive Rate (FPR) for each threshold in a curve. As in realistic 
applications, the FPR should be very low in order to avoid a large number of false alarms. Hence, the Area Under 
the ROC Curve value under the threshold of 2% FPR (AUC2%) is evaluated and used as a metric to evaluate 
different methods. 


4.2 Uncertainty Estimation Comparison 


We performed an analysis of the cost comparison for uncertainty estimation for videos ranging from 1 to 2000. As 
depicted in Fig. 4, the initial cost of deep ensembles is substantially greater than that of other methods. The State- 
of-the-Art FacerDir method exhibits a slightly greater initial cost relative to the conventional MCD, MCDB, and 
TTA approaches. However, for a higher number of video estimations, the method proved to be substantially more 
efficient. 
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Fig. 4 Time comparison for uncertainty estimation for different methods. The plot represents the time required to 
estimate uncertainty using different methods over 2000 cumulative videos. 
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The novel proposed PhDm and BiasedMSD-based uncertainty estimation methods exhibit the least overhead time 
overall. The FacerDir, PhDm, and BiasedMSD-based uncertainty estimation methods exhibit considerably low 
overhead costs, indicating a promising opportunity to advance toward the subsequent phase of incorporating an 
active learning for data curation. 


4.3 Baselines for Data Curation 


Active learning is performed iteratively till the quarter of the informative/uncertain dataset is sampled. The model 
is later trained on the sampled informative training data and the following baselines are used for comparing the 
model performance: 


4.3.1 Baseline for pool data curation 


Pool data curation is one of the main purposes of active learning, while the most informative data is curated for 
annotation using a sampling strategy. The function a(x) is defined as a draw from a uniform distribution over the 
interval [0, 1] using the function (Gal, et al. 2017). This criterion ensures that the selection strategy for acquiring 
annotated data is superior to a random selection approach and is used in majority of scientific studies. 


4.3.2 Baseline for train data curation 


Active learning is seldom used in the literature to curate the training data as it is always seen as a method to curate 
the unknown pool data. In our research, an active learning approach was utilized to curate the training dataset with 
the objective of mitigating potential implicit biases that may exist within the data. The assessment and comparison 
of various approaches are based on the performance of the primary network, which is trained on the complete 
dataset utilized for active learning experimentation. In the event that the active learning technique produces a 
subset of the training dataset that surpasses the primary network’s performance, it can be utilized for training data 
curation in a general context. 


4.4 Active Learning for Data Curation 


We conduct a series of experiments on three different data filter sizes viz. 15, 30, and 60 randomly filtered frames 
per sequence (fpseq). It is in-feasible to perform the experiments on the full internal dataset and hence, for research 
experimentation three different data filter sizes were randomly developed. It is important to note that the 
comparison of the metric over different data sizes internally is not possible as the data was randomly selected. 


The uncertainty estimation was performed using PhDm, BiasedMSD, and FacerDir. The uncertainty estimation 
was iteratively used in an AL setting to sample top uncertain samples. The uncertain samples were iteratively added 
and removed from the training and pool dataset, respectively. The uncertainty sampling for PhDm and FacerDir- 
based approaches were done using Sequence-wise (SW) and Frame-wise (FW) querying strategies. Fig. 5 
represents the comparison of the performance of different experiments over various data sizes. 
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Fig. 5: AL method performance comparison for different methods applied over various filtered data sizes (bold 
bar represents best performing method). FW and SW represent Frame-wise and Sequence-wise querying 
strategies. 
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5. DISCUSSION 


Fig. 4 illustrates the overhead cost associated with uncertainty quantification. It can be observed that the utilization 
of PCA and biased differencing-based techniques resulted in relatively shorter estimation times. This observation 
suggests that the proposed novel methods in the paper exhibit significantly lower overhead uncertainty estimation 
costs in comparison to the conventional and State-of-the-Art distillation approaches. 
AL methods comparison 
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Main Network 
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Methods 
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Fig. 6 Performance comparison of different AL methods. The random network is the baseline for pool data 
curation and the main network is the baseline for training data curation. The novel PhDm approach performance 
outperformed the random and main network baseline. 


It is evident from Fig. 6 that the novel PCA-based method outperformed other AL approaches in the context of 
training and pool data curation. The overall comparison of different AL approaches is illustrated in Fig. 6 . It is 
evident from the figure that the frame-wise selection approach was generally better than the sequence-wise 
querying strategy. The frame-wise selection of images from the videos proved to perform efficiently compared to 
selecting the whole sequence at a time using a sequence-wise selection strategy. The frame-wise querying strategy 
improved the performance of the PCA-based active learning method by 7.33% and the FacerDir approach by 
0.67%. This finding can be utilized to crop informative frames from different excessively long videos, which is an 
intriguing application of the active learning strategy. In general, every method outperformed the Random baseline. 
This indicated that every method performed well in the context of pool data curation and annotating the pool data 
using uncertainty quantification. 


In every experiment conducted using varying data sizes, the novel PhDm approach outperformed the main network, 
despite utilizing only 26.39% of the available data for model training. The approach improved the performance of 
the model architecture compared to the random baseline with 16.33% of AUC2% evaluation metric. The approach 
even improved the main network performance by 7.33% and outperformed other approaches. These findings 
suggest that the PhDm approach can achieve superior performance with significantly fewer training data as 
compared to the main network and is an approach with a very low overhead uncertainty quantification cost. 


BiasedMSD selects challenging data instances that the model struggles to comprehend as uncertain samples. 
Therefore, this approach is expected to yield superior uncertainty estimation results and excel in AL scenarios. By 
including image samples that the model has identified as false positives, the approach can effectively enhance data 
quality. The BiasedMSD approach improved main network performance by 5.66%, however, the improvement 
was not as evident as seen in the PhDm approach. The performance of FacerDir-based active learning was found 
to improve the network by 7.33% than the random baseline compared to a significant 16.33% improvement using 
the novel PCA-based hard data mining approach. 


Recapitulating the results observed in the experiments, the feature-based novel approach outperformed other 
approaches in the context of curating pool and training data using AL. This is attributable to the fact that the 
feature-based approach captured more information about the samples than sampling a probability distribution over 
several forward passes. 
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6. CONCLUSION 


This paper focuses on the various uncertainty estimation techniques for the object detection-based fire detection 
problem with an aim to curate the data. We proposed two novel approaches for curating the data using active 
learning. The novel PhDm and BiasedMSD approaches performed uncertainty estimation efficiently compared to 
the sampling-based methods and the State-of-the-Art FacerDir. The novel approaches were also found to 
outperform other benchmark methods in the task of curating the unlabeled data. PhDm was found to be the most 
efficient method for curating and improving the training data by decreasing the size of the training. 


Finally, we put forward potential avenues for future research exploration. Different uncertainty and active learning 
experiments were performed in the setting of binary classification. The experimentation on multi-class 
classification problems can be done using the novel approaches stated in the literature. The novel AL techniques 
can be further evaluated on various publicly available datasets. 
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ABSTRACT: The use of closed-circuit television (CCTV) for safety monitoring is crucial for reducing accidents 
in construction sites. However, the majority of currently proposed approaches utilize single detection models 
without considering the context of CCTV video inputs. In this study, a multimodal detection, and depth map 
estimation algorithm utilizing deep learning is proposed. In addition, the point cloud of the test site is acquired 
using a terrestrial laser scanning scanner, and the detected object's coordinates are projected into global 
coordinates using a homography matrix. Consequently, the effectiveness of the proposed monitoring system is 
enhanced by the visualization of the entire monitored scene. In addition, to validate our proposed method, a 
synthetic dataset of construction site accidents is simulated with Twinmotion. These scenarios are then evaluated 
with the proposed method to determine its precision and speed of inference. Lastly, the actual construction site, 
equipped with multiple CCTV cameras, is utilized for system deployment and visualization. As a result, the 
proposed method demonstrated its robustness in detecting potential hazards on a construction site, as well as its 
real-time detection speed. 


KEYWORDS: deep learning, multimodal, multiCCTV, synthetic data, pointcloud 


1. INTRODUCTION 


Construction sites are dynamic and complex environments that pose significant safety risks, resulting in an high 
rate of accidents and fatalities worldwide (Abdelhamid & Everett, 2000). Consequently, implementing effective 
safety monitoring measures is vital for reducing such incidents. The use of closed-circuit television (CCTV) 
cameras for safety monitoring on construction sites has played a crucial role in mitigating risks. Despite this, the 
full potential of CCTV data is often underutilized, primarily due to the majority of existing approaches employing 
single detection models without considering the full context of CCTV video inputs (Park et al., 2022, 2023; Tran 
et al., 2020). In response to this issue, this study proposes a novel and robust system that incorporates a multimodal 
detection and depth map estimation algorithm, utilizing the power of deep learning. The distinctiveness of our 
approach lies in the context-aware analysis, providing a more comprehensive understanding of the potential 
hazards present within the dynamic environments of construction sites. Furthermore, our proposed method goes 
a step further by leveraging terrestrial laser scanning technology to acquire the point cloud of the test site and 
utilizing a homography matrix to project the detected object's coordinates into global coordinates. This step 
enhances the overall monitoring system's effectiveness by visualizing the entire monitored scene, thus providing 
a bird eye view of the potential hazards. A crucial aspect of any newly proposed system is rigorous validation. 
For our method, we have created a synthetic dataset of construction site accidents using Twinmotion, a high- 
powered graphic software. This dataset provides a range of simulated scenarios to test the precision and inference 
speed of our proposed method, ensuring its reliability and robustness in varied contexts. 


Finally, we further validate our method by deploying it on an actual construction site equipped with multiple 
CCTV cameras, moving beyond simulations to a real-world setting. This on-site implementation allowed us to 
assess the practicality of our system and its ability to function optimally in an uncontrolled, real-world 
environment. The results from both simulation and real-world deployment demonstrate our proposed method's 
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robustness in detecting potential hazards on a construction site. This paper aims to highlight the potential of 
multimodal detection approaches in enhancing construction site safety measures, moving towards a future where 
such hazards can be preemptively detected and effectively mitigated. In the following sections, we will provide a 
detailed explanation of our proposed method, its development, and the validation process. We will also present 
the results and implications of this study, demonstrating how a deep learning-based multimodal approach can be 
used for safety monitoring in the construction industry. 


2. BACKGROUND 


This section provides the necessary background on the key aspects of our methodology, namely object detection, 
depth estimation, and multimodal synchronization. These three components collectively constitute the core of our 
proposed method and are fundamental in understanding the context-aware safety monitoring approach. 


2.1.Object Detection 


Object detection, as a fundamental component of computer vision, has undergone significant developments over 
the past decade, thanks to the advancements in deep learning and convolutional neural networks (CNNs) (Li et 
al., 2021). This involves identifying and locating objects within images or video feeds. In construction sites, object 
detection can identify critical elements such as workers, machinery, tools, and other potential hazards, thereby 
playing a vital role in safety monitoring. However, traditional object detection models typically operate 
independently, failing to incorporate the broader context of a scene (Jeon et al., 2023; Tran et al., 2022). These 
models often struggle with complex environments like construction sites, where multiple objects interact 
dynamically, and understanding these interactions is crucial for effective hazard detection. This limitation forms 
the motivation for our study, aiming to integrate a higher level of contextual understanding into object detection 
models using deep learning algorithms. In this research, two state of the art object detection models are utilized: 
Yolov8 (Redmon et al., 2016) and RTMDet (Lyu et al., 2022). These two object detectors are trained and validated 
in actual CCTV images and utilized for incorporation with other models. 


2.2.Depth Estimation 


For understanding context of input image, spatial information is crucial, therefore, a depth estimation model 
MiDAS (Ranftl et al., 2020) is utilized. Depth estimation refers to the task of determining the relative distance of 
objects within a scene from the viewpoint of the camera. It is a crucial component in understanding three- 
dimensional spaces from two-dimensional images or video feeds, providing invaluable information about the 
positioning and interaction of objects within a scene. In construction sites, depth estimation can enhance the 
understanding of spatial relations among various elements, such as the proximity ofa worker to a moving machine, 
thereby aiding in detecting potentially hazardous situations. This paper incorporates depth estimation into our 
proposed multimodal approach, further improving the context-awareness of the system. 


2.3.Multimodal Synchronization 


Incorporating multimodal synchronization into safety monitoring has the potential to significantly enhance the 
effectiveness of hazard detection. For example, a hazard that is not visible from one camera angle may be clearly 
observable from another. Similarly, certain hazardous situations may only be identifiable when considering 
multiple factors, such as the positioning and motion of various objects, which could be obtained from different 
data sources. In our proposed method, we aim to utilize multimodal synchronization to integrate object detection 
and depth estimation data, along with point cloud data obtained through terrestrial laser scanning. This 
synchronization allows for the projection of detected objects into global coordinates, enhancing the visualization 
of the entire monitored scene and thus the system's overall effectiveness. In the subsequent sections, we will 
elaborate on the implementation of these components in our proposed method and demonstrate their effectiveness 
in enhancing construction site safety monitoring. 
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3. PROSED APPROACH 


3.1.Proposed approach 


Figure | depicts the proposed method as follows: First, the input image is predicted using object detectors that 
have been pretrained. In this study, an object detector that can infer six classes is described in detail, along with 
the training procedure and training dataset. In addition, the MiDAS model is used to estimate the depth map using 
a previously trained depth estimation model. The Euclidean distance between objects is then estimated and 
visualized with the given depth and bounding box coordinates, after which each object's depth map is extracted. 
This research considers the distance between construction machines and employees, as well as the module to 
estimate whether or not a worker is in the danger zone. In addition, work-related personal protective apparatus is 
trained and implied. 
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Figure 1. Proposed Approach 


3.2.Dataset Acquisition 
The training dataset contains bounding boxes objects from 6 classes: normal worker, signalman, harness, hardhat, 


mixer truck and excavator. As visualized in Figure 2, each object is labeled in detail from actual CCTV footage. 
Figure 3 presented a training and testing dataset classes distribution. 
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Figure 2. Dataset visualization 
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Figure 3. Training and testing dataset distribution. The y-axis depicts the total of each class in training and testing 


datasets. The x-axis lists the distinct class names or labels in dataset. In our case, these are: “worker”, “mixer”, 
“signalman”, “hardhat”, “harness”, and “excavator”. 


For evaluate model performance, the detected objects are categorized into 3 types: "small", "medium", and "large" 
follows a specific definition based on the number of pixels occupied by an object in an image. Specifically, objects 
are categorized as follows: 


e — "Small": Objects occupying 0 to 1024 pixels, which translates to dimensions up to 32 x 32 pixels. 

e "Medium": Objects occupying between 1024 and 9216 pixels, which corresponds to dimensions 

between 32 x 32 and 96 x 96 pixels. 

e "Large": Objects occupying more than 9216 pixels, equating to dimensions of 96 x 96 pixels or larger. 
With these categorizations, our dataset reflects an essential aspect of object detection tasks: dealing with objects 
of varying sizes. The balance or imbalance of different categories may significantly affect the predictive 
performance of an object detection model. In our dataset, categories such as "mixer" and "excavator" contain a 
substantial number of "medium" and "large" objects. Conversely, the "hardhat" category is predominantly 
composed of "small" objects. This imbalance suggests that an object detection model trained on this data might 
develop a bias toward detecting "medium" or "large" objects and underperform on "small" objects. 


4. Experimental 


4.1.Quantitative 


Table 1. Object detectors performance. 


mAP mAP 50 mAP75 mAPs mAPm mapi FPS 
YOLOv8X 0.689 0.831 0.741 0.123 0.733 0.811 i 
RTMDet 0.628 0.776 0.688 0.059 0.641 0.756 79.964G 
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In order to quantify the results, the mean average precision (mAP) is employed as an evaluation metric. The mAP 
is a widely recognized indicator used to quantitatively assess the performance of object identification models. The 
research provides a comprehensive explanation of the mAP (Zhao et al., 2019). As presented in Table 1, 
YOLOv8X and RTMDet, with regard to their performance metrics, a notable variance in their effectiveness 
becomes clear. The YOLOv8X model exhibits superior precision with a mAP score of 0.689, which considerably 
exceeds the 0.628 mAP score of the RTMDet model. This indicates an overall higher rate of accurate detections 
by the YOLOv8X model. Further disparity can be observed at varying Intersection over Union (IoU) thresholds. 
The mAP_50 and mAP_75 scores, which represent the mAP values computed at IoU thresholds of 0.50 and 0.75 
respectively, demonstrate a superior adaptability of the YOLOv8X model to changes in detection difficulty levels. 
When checking the mAP values across different object sizes, denoted by mAP_s (small), mAP_m (medium), and 
mAP 1 (large), the YOLOv8X model continues to display superior performance. The model's proficiency in 
detecting small objects is particularly noteworthy, with a score of 0.123 compared to the RTMDet's score of 0.059. 
Nevertheless, it is crucial to consider the computational complexity of the models. RTMDet has a significant 
advantage in this regard, with a computational demand of 79.964G Flops, markedly lower than the YOLOv8X's 
0.129T (or 129,000G) Flops. This positions RTMDet as a more feasible option for applications with limited 
computational resources, despite its inferior mAP performance. While the YOLOv8X model outperforms 
RTMDet in terms of object detection performance across various metrics, the latter's significantly lower 
computational demand may make it a more suitable candidate for resource-constrained applications. The selection 
between these two models, therefore, necessitates careful consideration of the balance between performance 
efficiency and resource utilization, contingent on the specific requirements of the application. 


4.2.Qualitative 


As mentioned in previous sections, after detecting objects, the depth map is estimated and calculating the distance 
between worker and construction vehicles. As can be seen from Figure 4, by utilizing spatial information, the 
distance can be estimated and from that, the necessary warning can be conducted. 


Figure 4. Distance estimation using depth estimation and object detection. 


Figure 5 showed another application of the proposed approach by identifying which workers are in the danger 
area. The danger area is defined by the safety officer, and when detected worker violate that area, the number of 
violated cases will be shown and reported directly to safety manager. Along with detecting worker, a PPE 
detection models also consider to utilized, as can be seen from Figure 6, both hardhat and harness is detected for 
checking. To reduce the false positive, we remove detected PPE outside of the worker detected area, as can be 
seen in Algorithm 1. 


Algorithm 1. Detecting PPE 

Lasxw=#Lpdjh#L# 

Rxwsxw=#Glvsod |hgłerxqglqj#er {hv#ri#ghwhfwhgłvdihw | #htxlsphqwirg#shuvrav# 
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5=#HHHshuvrgPrgho#? 0#Lg1lwldol }h#suhwudlghg#prgho#iru#shuvrg#ghwhtwl rat 
6=HHHHHHVdihw | Htx1 sphqwPrgho#? 0#Lqlw1dol }h#rxu#prgho#irutkdugkdwidgg#kdughvv#ghwhtwl rot 
T= 
8=HHHHHEShuvrgErxqglqjEr {hv#?0#shuvrgPrgholghwhtw+L, # 
9=tHHHHVdihw | Htx1 sphqwErxqglqjEr {hv#? 0#vdihw|Htx1sphqwPrgholghwhtw+L, # 
> STHHHHHT 
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43 =}HHHHHHHHHHHHET a X#? O#FDOFXODWHDLRX+shuvrger { /#htx1lsphqwEr { , # 
AAH HHR HHRHH 
45 =}HHHHHHHHHHHHE] THT oe XHAH3 18Hwkhat 
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Figure 5. Detect workers in the danger area 
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Figure 6. PPE detection 


Finally, for developing multi-CCTV and bird eye view (BEV) synchronization. Two cases of examples are 
conducted as follows: first, by utilizing TwinMotion, the 4 CCTV channel is developed and used as an input for 
detection models, with homography matrix, the BEV is shown in Figure 7. 
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Channel 3 


Channel 2 
Figure 7. Multi CCTV with BEV in Twinmotion 


Similarly with above example, but in actual construction site, it difficult to obtain BEV image, that why TLS is 
utilized for scanning and from that BEV can be estimated as shown in Figure 8 and 9. However, by only projecting 
all detected object into BEV, the exact ID of objects vary channels to channels. Therefore, the application of multi 
object detection and tracking can be used for future research. The expected output is given multi-channel CCTV, 
the output is the BEV with exact number and ID of detected objects. This study mostly emphasizes qualitative 
experiments. In order to obtain quantitative results, the forthcoming experiment will be undertaken by establishing 
an indoor environment and thereafter measuring the error projection using a meter. 


Figure 8. Multi CCTV with BEV in actual construction site 
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Figure 9. Multi CCTV with BEV in actual construction site 


5. CONCLUSION 


By adopting a multimodal detection approach, the study proposes a depth map estimation algorithm designed to 
enhance the contextual understanding of video inputs from CCTV. The proposed method uses terrestrial laser 
scanning to generate a point cloud of the test site and leverages a homography matrix to project detected objects 
into global coordinates. With object detection models used in the proposed method, a detailed analysis of 
YOLOv8sX and RTMDet was conducted. YOLOv8X exhibited superior precision across various measures, 
including overall mAP, mAP at varying IoU thresholds, and mAP across different object sizes. However, the 
RTMDet model was identified as more resource-efficient, demanding significantly fewer computational resources 
despite its lower mAP performance. This research also presented some use cases of multimodel detections, it 
proves that context-aware approach in safety monitoring is important and should be considered for further 
research. 
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ABSTRACT: Fall from height (FFH) is one of the major causes of injury and fatalities in construction industry. 
Deep learning-based computer vision for safety monitoring has gained attention due to its relatively lower initial 
cost compared to traditional sensing technologies. However, a single detection model that has been used in many 
related studies cannot consider various contexts at the construction site. In this paper, we propose a deep learning- 
based pose estimation approach for identifying potential fall hazards of construction workers. This approach can 
relatively increase the accuracy of estimating the distance between the worker and the fall hazard area compared 
to the existing methods from the experimental results. Our proposed approach can improve the robustness of 
worker location estimation compared to existing methods in complex construction site environments with obstacles 
that can obstruct the worker's position. Also, it is possible to provide information on whether a worker is aware of 
a potential fall risk area. Our approach can contribute to preventing FFH by providing access information to fall 
risk areas such as construction site openings and inducing workers to recognize the risk area even in Inattentional 
blindness (IB) situations. 


KEYWORDS: deep learning, keypoint detection, pose estimation, computer vision, construction site safe 


1. INTRODUCTION 


Due to numerous hazards and safety challenges, the construction industry stands out as highly dangerous, 
characterized by elevated rates of accidents and injuries. Among these, falls from heights (FFH) emerge as a 
particularly frequent and urgent concern, often leading to severe injuries or fatal outcomes. These incidents 
underscore the inherent risks associated with construction activities, contributing to delays and economic setbacks 
(Rafindadi et al., 2022). 


Despite the stringent enforcement of safety standards, comprehensive worker training, and the adoption of 
advanced protective equipment, FFH-related accidents persist at an alarming rate. A closer examination reveals 
that these mishaps frequently result from worker negligence, inadequate situational awareness, or an inability to 
recognize impending dangers (Golparvar-Fard et al., 2013). The dynamic and ever-evolving nature of construction 
sites further exacerbates these challenges, rendering many traditional safety measures ineffective. 


Historically, human oversight and routine inspections have been the primary means of safety supervision in 
construction settings. However, these methods, being inherently subjective, often result in inconsistent safety 
assessments. Recognizing these limitations, there's been a shift towards leveraging emerging technologies such as 
computer vision and artificial intelligence (AI). While these innovations promise objective, consistent, and real- 
time safety evaluations, challenges remain. Specifically, detecting hazards like floor openings becomes complex 
due to occlusions from construction materials and scaffolding. Additionally, determining a worker's position, 
especially when parts of their body are obscured, remains problematic. 


In light of these challenges, this study proposes a novel approach, integrating computer vision and deep learning, 
tailored for construction site safety evaluations. The essence of our methodology lies in the fusion of quadrilateral 
detection, pose estimation, and single depth estimation. Quadrilateral detection accurately captures the contours 
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of target objects, pose estimation provides insights into their spatial orientation, and single depth estimation refines 
the distance measurements. By employing quadrilateral anchors, further enhanced by the Vision Transformer, our 
approach aims to provide a more robust and accurate tool for hazard detection and prevention. 


The structure of this paper is as follows: 


© Chapter 2 delves into relevant literature, providing a thorough assessment of existing approaches, their 
strengths, and inherent limits. 


© Chapter 3 elucidates our proposed methodology, shedding light on its unique facets and potential 
advantages. 


© Chapter 4 includes experimental results, including quantitative assessments as well as visual 
representations of our findings. 


© Chatpter 5 concludes the paper by summarizing the paper and suggesting future avenues for research. 


2. RELATED WORK 


The construction industry has long grappled with the challenge of ensuring worker safety, especially in the context 
of falls from heights (FFH) (Helander, M. 1980). This section delves into the existing methods and recent 
advancements in recognizing unsafe areas and the application of deep learning in the construction site context. 


2.1. KFFH-related Safety Monitoring 


Traditional safety measures, such as guardrails and safety nets, have been the primary defense against FFH 
incidents (Zhang, M. and Fang, D. 2013). However, the dynamic and complex nature of construction sites often 
renders these measures insufficient. The rapidly changing environment, coupled with the diverse nature of 
construction tasks, necessitates more advanced and adaptive safety solutions. 


Historically, the realm of automated construction safety has leaned heavily on sensor-based mechanisms. 
Techniques like radio frequency identification (RFID) were the go-to solutions for monitoring workers' movements 
into potentially hazardous zones (Costin, et al., 2012). Similarly, tools like global positioning systems (GPS) and 
ultra-wideband (UWB) played pivotal roles in identifying unsafe regions and pinpointing the location of workers 
and materials (Pradhananga, N. and Teizer, J. 2013). However, the drawback of these methods was the necessity 
for individual sensor installations. 


The modern era has seen a surge in the exploration of computer vision combined with deep learning as potential 
reasonable approach in safety area (Park, et al., 2020; Jeon, et al., 2023). Recent advancements in technology are 
poised to significantly transform the paradigm of worker safety through the automation of risk assessments. 
Noteworthy progress has been achieved in the application of computer vision to identify potential hazards (Tran, 
et al., 2022). However, some studies exhibit limitations in their scope, especially regarding the precise localization 
of dangers and the evaluation of workers' proximity to such hazards. In sight of these observations, there's a clear 
demand for a more refined approach that not only pinpoints hazards with precision but also factors in worker 
proximity in real-world scenarios. 


2.2. Deep Learning and Computer Vision in Construction Environments 


The dynamic and cluttered nature of construction sites poses unique challenges for deep learning models. The 
presence of obstructions like construction materials and scaffolding often disrupts the model's ability to accurately 
identify target objects. A popular solution, borrowed from interdisciplinary research, is the integration of attention 
modules. These modules enhance the model's feature extraction capabilities, emphasizing crucial aspects of images. 


Recent research has showcased the potential of attention mechanisms in improving detection accuracy, especially 
in environments where objects are either partially hidden or appear smaller due to perspective. For instance, several 
works refined the triplet attention mechanism, assigning greater significance to vital features. This refinement 
allowed models to zero in on specific image sections, enhancing the accuracy of worker detection, even in intricate 
construction settings. Their model could pinpoint a worker's approximate location, even if they were partially 
obscured. Another noteworthy approach is the use of distance intersection over union (DIoU) based on non- 
maximum suppression (NMS). This technique distinguishes between overlapping objects against the complex 
backdrop of construction sites. 
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However, these methods are not without their limitations. For instance, if a worker's lower body is entirely 
obscured, the detection only captures the visible sections. This makes it challenging to determine a worker's exact 
position based solely on bounding box coordinates. 


Enter pose estimation, a more adaptable solution to traditional object detection challenges posed by occlusions. 
Recent advancements in the intersection of construction safety and computer vision have highlighted its promise. 
By leveraging pose estimation, researchers have been able to identify unsafe work postures and ensure the proper 
use of safety equipment. 


Despite its potential, few have explored pose estimation to determine construction worker locations, especially 
considering obscured body parts. Moreover, basic detection methods, which don't account for distance, only 
identify the presence of workers and hazards without gauging the relative proximity between them. This limitation 
hampers their ability to issue timely warnings to workers approaching danger zones. 


The Vision Transformer model, applied to depth estimation, demonstrated the capability of assessing object depths 
and distances with a singular camera (Ranfil, et al., 2021). However, with increasing distances, the differentiation 
in depth becomes less discernible, potentially limiting its utility for comprehensive construction site surveillance. 
In conclusion, there is a pressing need for cost-effective computer vision techniques that can not only pinpoint 
worker locations but also preemptively warn them about impending hazards. 


3. METHODS 


To address the identified challenges, we propose a comprehensive methodology that leverages advanced computer 
vision and deep learning techniques. Our methodology consists of two main components: detection of floor 
openings using a quadrilateral-anchor-based Vision Transformer, and estimation of worker positions using a pose 
estimation approach. 


3.1 Baseline Detection Architecture 


Given the urgent and real-time nature of risk management on construction sites, it is imperative to employ a model 
that can provide immediate and accurate monitoring. YOLOv7 is boasting high accuracy while maintaining real- 
time capabilities in real time detection (Wang, et al., 2023). While minor trade-offs in FPS might occur, extending 
YOLOv?7 with additional models presents an avenue for enhancing accuracy. In view of these considerations, 
YOLOv7 was chosen as the foundational model for our study. 


3.2 Attention Mechanism 


The introduction of the attention mechanism has proven transformative in object detection tasks. By dynamically 
emphasizing salient image features while downplaying less informative ones, models exhibit enhanced capacity 
for accurate object detection and classification, all while preserving the efficiency of the detection process (Park, 
et al., 2022; Guo, et al., 2022). Traditional attention methods, including squeeze-and-excitation (SE) (Hu, et al., 
2018) and convolutional block attention mechanism (CBAM) (Woo, et al., 2018), utilize convolutional neural 
networks to recalibrate feature maps, enhancing model accuracy and robustness by prioritizing meaningful 
information over less relevant components. 


3.3 Polygon Anchor for Object Detection with Vision Transformer 


The detection of floor openings is achieved through a convex quadrilateral-anchor-based Vision Transformer. This 
approach allows for accurate detection of floor openings, even when the camera perspective is not aligned with 
the floor opening as shown in Figure 1. The quadrilateral anchors allow for more flexibility in defining the 
bounding boxes, enabling them to closely match the actual boundaries of the floor openings. The Vision 
Transformer is used to extract both the global dependencies between the floor openings and other parts of the 
building and the local features specific to the floor openings. This combination of global and local feature 
extraction enhances the detection performance of the floor openings. 
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These two components, when combined, allow for a comprehensive, real-time monitoring of safety on construction 
sites. They enable not only the detection of floor openings but also the identification of unsafe zones and the real- 
time tracking of worker positions. This comprehensive safety monitoring can significantly enhance the safety at 
construction sites. 


Fig. 1: Using convex quadrilateral anchors to detect floor openings. The purple boxes represent the ground truth, 
while the green boxes represent the predicted boxes. The overlapping area between the two boxes, indicated by 
yellow dots, is used to calculate the IoU (Intersection over Union). 


3.4 Integrated Model for Floor-Opening Detection 


we incorporate the YOLO-pose model (Maji, et al., 2022), which enhances YOLO capability to predict human 
poses. This model leverages pose estimation to refine object localization, offering a more comprehensive 
understanding of worker positions and orientations in relation to floor openings. This integration aims to amplify 
the focus on critical features and supply a more comprehensive contextual representation of images. This 
augmentation potentially heightens the efficacy and precision of our object detection model. 


The envisioned integrated model aims to capture both local attributes and global interdependencies of floor 
openings. Local characteristics encompass attributes like shape, size, and color, facilitating nuanced 
comprehension and enabling their integration with other objects. Meanwhile, global interdependencies offer a 
holistic understanding of image constituents. For instance, floor openings often correlate with specific building 
elements, such as slabs, and a global outlook can encapsulate contextual information on a broader scale. This 
approach not only bolsters the perception of individual components but also provides insights into the overall 
building structure and floor opening placements. 


3.5 Estimation of Relative Distances Between Workers and Danger Zone 


Ensuring safety on construction sites entails assessing the proximity of workers to potential hazards, in this case, 
openings or drop-offs. Our method employs a combination of pose estimation and depth estimation to achieve this. 


Firstly, we leverage the pose estimation of workers, specifically focusing on the leg parts, to gauge the distance, 
denoted as D. The rationale behind this focus is that the legs are often the closest body parts to openings or drop- 
offs and thus serve as critical indicators of a worker's proximity to these hazards. 


Once D is determined, we then identify areas within a radius of D from detected openings as "hazard zones." Any 
worker located within this zone is considered at risk, warranting immediate safety interventions. Also, In the 
context of drop-offs, we utilize single depth estimation. When a detected individual's depth estimation result 
exhibits a drastic change, indicating proximity to areas with significant depth differences, they are deemed to be 
in a hazardous zone. Essentially, if a worker is close to an area where the depth changes abruptly, it is considered 
a potential fall hazard. 
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By integrating pose and depth estimations, our approach provides a comprehensive measure of potential risks, 
enabling proactive safety measures on construction sites. 


Predicted 
Unsafe Safe 


TP 
(283) 


Unsafe 


Real Safety Status 
Safe 


Fig. 2: The results of confusion matrix for the safe/unsafe decision making in construction site. 


4. RESULTS 


For our experiments, data were collected from both real construction sites and 3D simulation models. We 
specifically gathered images and videos of workers situated near openings and drop-offs where safety measures 
were not adequately implemented. In total, 3545 datasets were collected. We then employed a training-validation- 
test schema with a split ratio of 3:1:1, respectively, to evaluate our model's performance. The proposed method not 
only improves the detection of floor openings but also estimates the relative distance between the workers and the 
openings. The method defines unsafe zones around the openings based on this relative distance and provides real- 
time warnings to the workers when they enter these zones. The experiments also demonstrate the robustness of the 
proposed method in handling the complex and dynamic environment of construction sites. 


The quantitative accuracy of the alert for FFH prevention at openings and edges from the proposed method is 
represented in Fig. 2, visualized using a Confusion matrix. We used a confusion matrix to assess the performance 
of our model in predicting whether a worker is in a hazardous zone or a safe zone. In real construction sites, due 
to the inherent nature of the environment, most of the samples were from safe zones, which explains the higher 
number of safe situation. By testing in both real and virtual environments, we ensured a comprehensive evaluation 
of our model's performance across diverse scenarios. Our model achieved an accuracy of approximately 93.2%, 
representing the proportion of total predictions that were correct. The precision of the model was 93.1%, indicating 
that when our model predicted a worker to be in a hazardous zone, it was correct about 93.1% of the time. The 
recall was valued at 90.7%, showcasing that our model correctly identified 90.7% of all actual hazardous situations. 
Harmonizing precision and recall, the Fl-Score was found to be 91.9%. Fig. 3 visualizes the inference results of 
the models used in the proposed methodology. The final decision on risk/safety is automatically determined 
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through post-processing from a combination of three algorithms. 


(c) 


Fig. 3: Results of computer vision models in simulated hazardous situations for openings and edges using Unity. 
(a) Original image, (b) Pose estimation results, (c) Depth estimation results from a single camera, (d) Polygon 
detection results for floor openings. 


5. CONCLUSIONS 


This study presents a pivotal advancement in automated safety monitoring within the construction domain. By 
harnessing state-of-the-art computer vision and deep learning paradigms, our method adeptly detects floor 
openings and estimates the proximity of workers, facilitating real-time risk assessment. The potential to preempt 
accidents and heighten safety through this methodology is profound. 


However, our study acknowledges certain limitations. Notably, while our method is primed for detecting risks, it 
may register false positives, especially in scenarios where floor openings function as stairs or access points, and 
workers are expectedly moving in and out. Conversely, false negatives can manifest when workers operate beyond 
a specific distance from the camera, rendering them undetectable. As per the feedback, it's crucial to clarify that 
we haven't explicitly stated or tested the exact distance threshold on-site, which denotes the limit beyond which 
workers might not be detected. These limitations point to the need for further research and improvement in our 
method. Future work should aim to address these issues, as well as incorporate considerations of existing safety 
measures on construction sites to provide more nuanced safety monitoring information. 
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ABSTRACT: According to the Ministry of Manpower, falling and slipping accidents are one of the most common 
accidents in addition, falls from heights (FFH), including accidents during scaffolding work, are still a major 
cause of death in the construction industry. Regular safety checks are currently being carried out on construction 
sites, but scaffold-related accidents continue to occur. Sensing technology is being attempted in many industrial 
sites for safety monitoring, but there are still limitations in terms of the cost of sensors and object detection, which 
are limited to certain risks. Therefore, this paper proposes a deep learning-based pose estimation approach to 
identify the risk of falling during scaffolding work in the construction industry. Through analysis of the correlation 
between unstable behavior during scaffold work and the angle of keypoints of workers, the proposed approach 
demonstrates the ability to detect the risk of falling. The proposed approach can prevent falling accidents not only 
by detecting construction site workers, but also by detecting specific risky behaviors. In addition, in limited work 
environments other than scaffolding work, the information on unstable behavior can be provided to safety 
managers who may not be aware of the risk, thus contributing to preventing falling accidents. 


KEYWORDS: deep learning, pose estimation, keypoint angle calculate, construction site safe monitoring, falls 
from heights 


1. INTRODUCTION 


According to the Ministry of Manpower, falling and slipping accidents are one of the most common accidents in 
addition, falls from heights (FFH), including accidents during scaffolding work, are still a major cause of death in 
the construction industry. Working at height on construction sites is associated with a major risk of falls that must 
be properly managed to prevent injury and death. 


Statistics from the U.S. Department of Labor in 2021 indicate that nearly one in five workplace fatalities occurred 
in the construction industry, with over a third of these attributed to falls, slips, and trips. The construction industry 
was responsible for 46.2% of all deaths from falls, slips, and trips that year(U.S. BUREAU OF LABOR 
STATISTICS, 2023). 


Based on data from Singapore's construction sector, construction accidents have been on the rise from 2020 to 
2022, accounting for 171 incidents (26% of total accidents). Of these, FFH accounted for 55 incidents (32%), 
while slips and trips were responsible for 27 incidents (15.8%) (MOM, 2022). 


Meanwhile, in South Korea, despite the implementation of the Major Accident Act in January 2022 aimed at 
reducing fatalities among construction workers, there has been a 0.01% increase in such incidents as of March 
2023. Specifically, in 2022, out of 539 deaths, 268 were attributed to "falls," marking the highest cause. These 
statistics reconfirm that the construction sector is hazardous, with a pronounced risk of fall incidents. 


While various personal protective equipment and safety devices have been developed to prevent accidents, a study 
by (Xia et al., 2018) highlighted that one of the main causes of fall incidents is the absence or insufficiency of 
worker behavior supervision. Information on the posture of construction workers can serve as valuable data for 
evaluating safety and productivity (Xu et al., 2022). Several studies have been driven based on this perspective. 


(Khan et al., 2021) researched methods to detect unsafe behaviors of workers using IMU sensors, specifically by 
assessing the angle of attached hooks. Enrique (Valero et al., 2016) used wearable devices based on IMU sensors 
to analyze movement data and detect unsafe postures. There have also been significant strides in using computer 
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vision for safety research. (MassirisFernandez et al., 2020) proposed a method to assess work risks by analyzing 
joint angles of workers under limited visibility conditions. 


Deep learning methodologies have also been applied to safety research. (Khan et al., 2021) utilized deep neural 
networks to classify and segment worker movements, detecting unsafe behaviors on moving platforms. Pinsheng 
(Duan et al., 2023) introduced a technique using OpenPose to detect personalized stability based on the individual 
characteristics and habits of workers at height. 


However, a gap has been identified in safety monitoring research following the Standard Operating Procedure 
(SOP). This study aims to address this gap by proposing a method to monitor unsafe postures of workers during 
scaffolding operations based on their shoulder joint angles, in accordance with the SOP. 


Furthermore, by actively incorporating such innovative technology at construction sites, it is possible to enhance 
worker safety and improve productivity. For instance, these posture detection systems can support workers in 
maintaining safe working postures and preemptively detect unsafe conditions to prevent accidents. This not only 
elevates safety standards in the construction industry but also enhances work efficiency while reducing fatigue and 
injuries during operations. 


Moreover, the practical utilization of such technology in construction sites can lead to adherence to project 
schedules and cost savings. Real-time posture detection and analysis at the site are expected to contribute 
significantly to enhanced productivity and the creation of a safe working environment in the construction industry. 


2. LITERATURE REVIEW 


The construction industry has been a focal point of research on workplace safety, largely due to the high frequency 
of fall-related accidents (MOM, 2022; U.S. BUREAU OF LABOR STATISTICS, 2023). With a significant portion 
of accidents attributed to falls from heights (FFH), slips, and trips, it is clear that this is an area in need of targeted 
interventions and technological innovation. 


Efforts have been made to mitigate the risk of falls in the construction industry by developing personal protective 
equipment and safety devices. However, studies suggest that the root cause of fall incidents often lies in inadequate 
supervision of worker behavior (Xia et al., 2018). Construction worker's postures are a valuable source of stability 
and productivity information that can be used to manage construction sites (Xu et al., 2022). 


Monitoring changes in a worker's posture can provide information about whether the worker is in a safe or unsafe 
position. Numerous studies have been conducted to detect and analyze unsafe behaviors of workers using this 
approach. For instance, (Khan et al., 2021) conducted a study using IMU sensors to determine unsafe worker 
behavior through the angle of secured hooks. Similarly, Enrique (Valero et al., 2016) used wearable devices based 
on IMU sensors to measure movement data, detecting and characterizing the unsafe postures of workers. Other 
researchers have also leveraged computer vision to detect and analyze worker posture information. For instance, 
(MassirisFernandez et al., 2020) evaluated job risk by inferring joint angles of workers under limited field of view 
conditions based on computer vision. 


Deep learning has also found its applications in this context. (Khan et al., 2021) identified unsafe behaviors on 
mobile platforms through object correlation detection using deep neural network-based worker classification and 
segmentation. Also, (Duan et al., 2023) proposed a personalized stability detection technology based on high- 
altitude workers' body posture patterns by detecting individual physical characteristics and habits using the 
OpenPose method. 


Amidst these technological advancements in detecting unsafe behaviors, there's a noticeable gap when considering 
overlapping or closely interacting workers in construction sites. Addressing this complexity, (Park et al., 2023) 
introduced a method for detecting small and overlapping workers at construction sites, emphasizing the intricate 
interactions and spatial relations of workers in crowded environments. Demonstrated significant potential in 
ensuring workplace safety by identifying such challenging scenarios with higher accuracy. 


While these prior studies have inferred the results of unsafe worker behaviors using sensors and evaluated worker 
stability, they often lack in terms of monitoring safety during scaffolding work as per Standard Operating 
Procedure (SOP). Therefore, this study aims to monitor unsafe postures by detecting shoulder joint angles of 
workers during scaffolding work based on SOP, as an effort to prevent fall accidents in this context. 
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3. METHODS 
3.1 Research framework 


In the construction industry, the safety of workers holds paramount importance, especially with the rising concerns 
related to fall-related accidents. Traditional methodologies have primarily leaned towards sensor-based approaches 
to address these concerns. However, there's a growing interest in methodologies based on computer vision. In this 
framework, we utilize the YOLOv7 Pose algorithm to detect and assess the shoulder joint angles of construction 
workers to discern potential hazardous situations. As illustrated in Fig. 1 the framework to monitor the stability of 
construction workers working at elevated heights consists of five modules. First, data is collected from the 
construction site, encompassing a myriad of movements by the workers. From the collected video data, essential 
frames are extracted and major regions pertaining to the workers are labeled accordingly. Next, the YOLOv7 Pose 
algorithm is employed to detect the skeleton of the workers. Special emphasis is laid on the skeleton information 
around the shoulder joint to calculate the angle of the shoulder joint. Subsequently, an analysis is performed to 
determine if the current angle of the worker's shoulder joint poses any risk in comparison to a set standard angle. 
If any hazardous posture is detected, real-time alerts are displayed, urging the implementation of necessary safety 
measures. 


Data set 


Validation 


Fig. 1: Research Framework. 


3.2 Identification method 


Ensuring the safety of construction workers in high-risk environments, such as working at heights, has always 
been a top priority in the construction industry. Falls from heights are a common type of accident on construction 
sites, and many of these incidents occur when workers lose their balance and become unstable. In particular, 
scaffold work, which is frequently performed by construction workers, is associated with a high number of 
accidents. Therefore, in this study, the OpenPose method, which detects human postures from images and videos, 
was used to detect and analyze the postures and joint information of scaffold workers. The algorithm, trained using 
the MS COCO dataset and videos captured at construction sites, detects and represents human posture as shown 
in Fig. 2 It identifies a total of 17 joints, numbered from 0 to 16, as illustrated in Fig. 2, right image, and provides 
information about each joint. Table 1 illustrates unsafe postures that can occur during scaffold work. While various 
postures are possible during scaffold work, such as sitting, standing, bending, or bowing the head, there is a posture 
where both arms are raised simultaneously, lifting both the left and right shoulders, with a shoulder angle of less 
than 90 degrees. In this study, a posture involving the simultaneous elevation of both arms and shoulders with an 


643 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


angle less than 90 degrees is designated as an unsafe posture. 


Fig. 2: Joint position information. 


Table 1: Unstable behavior criteria for scaffolding operations 


Unstable behavior description Angle name Angle numbers Range Threshold 
Arm rasing Left-Shoulder #756 0°~270° ZLS < 90° 
Right-Shoulder 28,6,5 0°~270° ZRS < 90° 


3.3 Feature extraction 


Algorithm 1 demonstrates a method for representing risk alerts based on angle calculations. This is employed to 
indicate stability alerts based on the angle when a worker raises their arm during operations. The algorithm 
consistently computes risk alerts based on shoulder angles, thereby continuously detecting workers and providing 
real-time notifications for risks associated with unsafe postures. It processes the input, detecting individuals in the 
video, and then delineates Bounding Boxes (BBox) and keypoints. Subsequently, it performs ongoing calculations 
for the angles of the R (Right) and L (Left) Elbows and R, L Shoulders. When the angle falls between 90 and 180 
degrees, a warning notification is sent, while angles below 90 degrees trigger an Unsafe posture notification, which 
is visualized and transmitted. 
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Algorithm 1: Finding Angle between Three Points 


Input: Detected Workers 


Output: angle 


— 


: Extract coordinates for p1, p2, p3 from kpts. 


N 


: Calculate angle using atan2. 


U 


: if angle >= 180.0 THEN label = '[safe!]', color = blue 
4: else if 90 < angle < 180 THEN label = '[Warning!]', color = yellow 


5: else label = '[Unsafe!]', color = red 


O 


: if draw then visualize angle, label on image. 


N 


: Return angle. 


Fig. 3 demonstrates the method for detecting unsafe postures when the algorithm is applied during video recording. 
The YOLOv7 algorithm was utilized for its speed and capability to detect individuals while simultaneously 
detecting joints. The worker's joints are divided into 17 distinct points, from the head to the legs, each assigned an 
ID. To identify unsafe postures during scaffold work, the angles of the shoulders are crucial when the arms are 
raised. Therefore, angles for the joints L-Shoulder(5) and R-Shoulder(6) were detected and calculated. For angle 
calculation, a line connecting L-Shoulder(5) and R-Shoulder(6) was used as the baseline, and vectors were applied 
to the Elbow joint for angle calculation, as the calculation required angles for external rotation at each shoulder 
joint. Following the KOSHA GUIDELINE, when a worker raises their arm, an unsafe posture is determined based 
on a 90-degree threshold. The system alerts the user with three levels: over 180 degrees for safe, between 90 and 
180 degrees for warning, and below 90 degrees for unsafe. 


Sroda Aglo Qutouion 


T SS =" Sy 
Baseline 


f angle > 180° A 


A Safety] . 


angle < 90° 


Wosatety 


Fig. 3: Joint information and how to calculate shoulder joints 


645 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


4. RESULTS 


(b) 


Fig. 4: Worker safety monitoring results during scaffolding operation. (a) Scaffolding installation work image 
taken diagonally, (b) Scaffolding installation site work image taken from the front 


In Fig. 4 (a) presents the results of detecting a worker engaged in scaffold installation captured from a diagonal 
angle. When the worker raises their arms, the proposed angle-based criteria triggers a [Warning!] alert, indicating 
the detection of the worker and the activation of the alarm system. In (b), an image captured from the front 
showcases the construction site, not only identifying scaffold workers but also other workers engaged in different 
processes. As these workers are not in the arm-raised posture, no warning alerts are generated. 


The proposed method for detecting unsafe postures during elevated work involves worker detection on 
construction sites and shoulder joint angle assessment for safety monitoring. This approach aligns with safety 
posture guidelines applied when workers raise their arms during elevated tasks, providing real-time warning 
notifications. This experiment validates the robustness of this study in facilitating easier site supervision and 
monitoring by administrators in complex and chaotic construction environments. 


5. CONCLUSION 


This study aims to contribute to the advancement of automated safety monitoring technology in the field of 
construction sites. The proposed method utilizes a Deep Learning-based OPENPOSE pose estimation model to 
estimate the postures of scaffold workers on construction sites. This estimation aims to provide real-time risk alerts 
to prevent falls from height (FFH) accidents. This research is expected to reduce the fatigue of construction site 
managers and decrease the occurrence rate of scaffold-related fall accidents. 


However, during this research, limitations and areas for improvement have been identified. Due to the nature of 
construction sites, there are often obstacles and situations where workers are obstructed or difficult to detect due 
to equipment. Additionally, as objects move farther from the camera, issues were observed in joint angle 
calculations and false positives/negatives of worker detection. This indicates the need for further research and 
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algorithm improvements. To address these challenges, future research aims to enhance detection accuracy based 
on distance and devise strategies to detect workers obstructed by obstacles. Furthermore, incorporating Multi- 
Camera setups is intended to calculate worker head angles. 
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PREDICTIVE SAFETY MONITORING FOR LIFTING OPERATIONS 
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ABSTRACT: Construction industry has reported among the highest accident and fatality rates over the past 
decade. In particular, crane lifting is a notably hazardous operation on construction sites, causing fatal accidents 
like workers being struck by the boom or objects fallen from tower cranes. Manual monitoring by on-site safety 
officers is labour-intensive and error-prone, while incorporating computer vision techniques into surveillance 
cameras would enable more automatic and continuous monitoring of construction site operations. However, 
existing studies for lifting safety mainly detect the presence of individual objects (e.g. workers, crane components), 
while a methodology is needed to predict their potential collision more proactively before accidents happen. This 
paper develops a vision-based framework for predictive lifting safety monitoring, including three modules: (1) 
object detection and classification: targeting at hook and lifting materials to enable danger zone estimation, along 
with workers and their personal protective equipment; (2) worker movement tracking and prediction: analyzing 
the historical moving trajectory of each unique worker to foresee his/her future movement in certain period ahead; 
(3) multi-level safety assessment: issuing predictive warning in real-time upon any crane-worker conflict foreseen. 
The proposed framework is applicable to real-time site video processing and enables end-to-end lifting safety 
monitoring with instant alerting upon unsafe scenarios observed. Importantly, the proposed framework predicts 
the future movement of workers to proactively identify potential site hazard, in order to trigger earlier safety alert 
for more timely decision-making. With a large video dataset capturing tower crane operations, the proposed 
framework demonstrates competitive accuracy and computational efficiency in crane-worker conflict prediction, 
validating its practicality for real-time lifting safety monitoring. 


KEYWORDS. Computer Vision; Construction Safety Monitoring; Crane-Worker Conflict Prediction; Deep 
Learning; Predictive Safety Assessment; Trajectory Tracking. 


1. INTRODUCTION 


The construction industry has been plagued for long by a high frequency of accidents and fatalities. According to 
statistics from the Hong Kong Labour Department (2018), the industry accounted for 76% of occupational 
fatalities in 2017, making it the most dangerous sector in Hong Kong. Similarly, the U.S. Bureau of Labour 
Statistics (2018) reported an average rate of 2.6 deaths per day, resulting in 949 deaths for the year. With reference 
to an overview of the Hong Kong construction industry (Shafique & Rafiq, 2019), there were on average 3597 
occupational injuries and 20 occupational fatalities per year between 2011 and 2017. The U.S.A Department of 
Labour (2022) indicated that the estimated cost of employers’ direct compensation to construction accidents is up 
to US$1 billion per week. These statistics suggest the urgent need for improved construction safety measures to 
protect the lives of workers and mitigate the financial burden that accidents impose on employers and the economy. 


To address this critical issue, governments have established safety guidelines and regulations to standardize the 
industrial practices of construction safety monitoring. Lifting operations using tower cranes are a crucial aspect 
of construction work that requires particular attention, as they involve dynamic interactions between workers and 
machines. Traditionally, safety monitoring relies heavily on manual inspection by on-site safety officers. However, 
this method is prone to errors due to human fatigue, which can result in overlooked incidents. In recent years, 
advancements in artificial intelligence have led to the development of computer vision (CV) methods that can 
automate construction safety monitoring. These methods enable real-time object identification, improving the 
accuracy and efficiency of safety monitoring. However, there are two research gaps: (1) Previous approaches have 
focused on analyzing individual objects, such as workers and machines, separately, without a more comprehensive 
framework that considers their spatial interaction in real-time. (2) Previous studies have primarily focused on 
analyzing current scenarios/activities on sites, while a more predictive mechanism is needed to proactively 
identify and prevent potential accidents ahead of time. Therefore, this study develops a predictive safety 
assessment framework that monitors potential crane-worker conflicts and enable proactive incident prevention, 
ultimately reducing the number of accidents and fatalities in the construction industry. 
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SECTION C - Al, DATA SCIENCE AND ANALYTICS 


2. RELATED WORK 


Collision between workers and construction equipment happens regularly in complex and distracting construction 
environment that is overcrowded with workers. Close contact between construction machines and workers are 
one of the major causes of collision event that lead to injuries and deaths. Sensors such as GPS and RFID have 
been explored in prior studies. With the help of sensors, real-time spatial-temporal information could be provided 
for proximity measurements. As a result, a spatial-temporal relationship can be detected, and an early warning can 
be sent out to prevent the accident from happening (Liu et al., 2021). However, to obtain enough information to 
safeguard the construction site, numerous sensors have to be installed. The heavy financial burden will be caused 
by purchasing and hiring a professional individual to install and maintain the sensors (Zhang et al., 2020). 


CV-based object tracking is a superior alternative to sensors since it lowers the cost and requires fewer resources 
to set up, therefore, more appealing to the industry. Previous research has trained YOLOv3 deep learning model 
for 2D positioning various construction site entities on 2D images captured from Aerial vehicles. Several studies 
developed convolutional neural networks to detect personal protective equipment (PPE), such as helmet and 
reflective vest (Cheng et al., 2022, Fang et al. 2018, Nath et al. 2020). Besides object detection, several studies 
developed human tracking algorithms to analyze behavior of each person more continuously (Kim et al., 2019, 
Wong et al., 2021). Other studies attempted to predict the future action of construction machines like excavators, 
based on their historical motion patterns (Luo et al., 2021), and also semantic segmentation that fine-grains the 
detected objects at pixel level to allow better positioning (Jeelani et al., 2021). 


While previous studies can perform real-time object detection and tracking, a more comprehensive framework 
beyond developing those algorithms is needed for practical construction safety monitoring. An automatic safety 
evaluation system shall be established to enable effective intervention mechanisms to prevent the accident from 
happening. Previous studies have proposed some distance-based hazard evaluation criteria (Son et al., 2019). 
Some researchers have taken the velocity of construction equipment and workers into consideration as there is an 
association between larger velocity and collision accidents (Golovina et al., 2016). A previous study has attempted 
to determine the dynamics direct fall zone of a crane load using a mounted tower crane camera with computer 
vision (Chian et al., 2022). These studies enhance construction safety monitoring with the ability to predict the 
direct fall zone, where workers can be proactively prevented from entering danger zones. 


3. PROPOSED METHODOLOGY 


To facilitate tower crane safety monitoring, a vision-based framework is developed which comprehensively 
supports end-to-end CCTV analytics for real-time safety assessment. The overall procedure and information flow 
are summarised in Figure 1, with three major functional modules: (1) object detection and classification: 
interested objects in each video frame are detected and classified into three categories (i.e. workers, hook and 
lifting materials); (2) worker movement tracking and prediction: analyzing the historical moving trajectory of 
each unique worker and predict his/her possible location in certain period ahead; (3) multi-level safety 
assessment: issuing predictive warning in real-time upon any unsafe crane-worker conflict observed. 


Module 1: Object Detection and Classification 


Lifting material 


Module 2: Worker Movement Module 3: Multi-level 
Tracking and Prediction Safety Assessment 


Visual appearance features Danger zone estimation 
+ + 
Positional change patterns Predicted worker movement 


l 


Movementtrajectories Worker-crane conflict? 
uy Moving velocity iy Alarm/message 
Predicted location Predictive warning 


Figure 1: Overall information flow of the proposed crane safety monitoring framework 
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3.1 Object detection and classification 


Upon receiving the raw videos from multiple cameras, the objects of interest are identified in Step 1. This is a 
crucial step that demands an automated process and accurately bounded objects (i.e. cropping the portion of image 
around each object with minimal background clutter). There have been numerous studies specialized in 
construction object detection (Fang et al., 2018; Luo et al., 2019; Memarzadeh et al., 2013), as well as 
comprehensive surveys of various state-of-the-art object detection methods (Brunetti et al., 2018; Huang et al., 
2017). Hence, this paper adopts a competitive detection model for the object detection step. In particular, the 
YOLOvé8 algorithm is used in view of its detection accuracy and inference efficiency revealed in recent studies. 


More specifically, three types of construction objects are targeted, i.e. construction workers, crane hook and lifting 
materials during crane operations. With videos collected from construction sites, each object-of-interest is detected 
and cropped as a rectangular bounding box. Subsequently, a classification module outputs the corresponding class 
index associated with each bounding box. Figure 2 illustrates a sample output of detection and classification 
(worker bounded by a purple box, hook by red and lifting material by green). 


Figure 2: Illustration of object detection and classification results 


For those detected boxes labeled as workers, a more fine-grained classification regime is defined to further analyze 
whether each worker wears necessary PPE, i.e. helmet and vest. As illustrated in Figure 3, two sub-categories are 
output by the classification model to determine the presence of helmet and vest respectively in their corresponding 
part of a body. To make the methodology more practical, the model is trained with both confirming and dis- 
confirming classes, e.g. the head part is marked even no helmet exists around there. This approach renders the 
PPE inspection more accurate, because it avoids improper behavior, e.g. hand-carrying a helmet without properly 
wearing on the head. In that case, our method can correctly report that PPE is not properly worn, while ordinary 
detection method only identifies the presence of PPE in hand, which indeed violates PPE compliance. 


Disconfirming Classes (without PPE equipped) 


No Helmet ~~ 


p et ee ee N 
| No Vest | 


Figure 3: Definition of PPE statuses of a worker 
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3.2 Worker movement tracking and prediction 


On top of the object detection results, worker trajectory tracking is performed to support worker behavioral 
analysis based on movement pattern. In this study, the method DeepSORT (Wojke et al., 2017) is utilized as a 
baseline to perform worker trajectory tracking over video frames, acquiring a complete trajectory of individual 
construction worker. The set of bounding boxes classified as ‘worker’ in Step 1 are further processed by 
DeepSORT, which subsequently analyzes the appearance features extracted from each worker and the positional 
change of the bounding boxes, in order to map unique identities to individual worker. Figure 4 illustrates the 
assignment of unique identities to individual workers (22 to the left-sided worker, 29 to the right one). 


Figure 4: Illustration of worker tracking with unique identity assigned to different workers 


Based on the trajectories of individual construction workers, their potential movement is then predict to foreseen 
whether their moving trajectories will potentially coincide with any lifting zone of the tower cranes nearby. This 
will allow dispatching warning signals in a more timely manner before workers actually enter the lifting zones. 
The prediction of future movement of each worker is defined in Equation (1), which computes the image 
coordinates of the predicted worker location ¢ timesteps later based on his/her observed velocity v. 


d2= dı+ vX t (1) 
where, 
dı = coordinates of the current location, 
d2 = coordinates of the predicted location, 
v = velocity along corresponding direction, 
t = time (measured by number of frames). 


3.3 Multi-level safety assessment 


By combining the output from object detection and worker movement tracking modules, spatial relationship 
between construction workers and lifting equipment is established. Regarding tower crane operations, the “Safe 
Lifting 3-3-3” Principle published by Hong Kong government (2020) is an industry standard in lifting operations. 
As illustrated in Figure 5, the 3-3-3 Principle states that workers should keep themselves 3m away from the lifting 
materials to ensure their safety. Yet, the 3-3-3 Principle only defines a single level of safety distance to be 
maintained from the lifting zone, while different degree of proximity may imply various levels of safety. Moreover, 
the standard only considers static behavior of workers (i.e. current location), while a more predictive safety 
monitoring regime is needed to consider the possible movement of each worker in certain period ahead. 


Figure 5: Definition of Safe Lifting 3-3-3 Principles 
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On top of the 3-3-3 Principle, the 3m operating region is defined as the danger zone around the lifting material 
detected by the object detection module. With the bounding box of lifting material generated, a danger zone of 
radius 3m around the bounding box centre is estimated. Different warning signals are then sent according to the 
corresponding risk scenarios and the predicted trajectory of the workers. As summarized in Table 1, different 
scenarios implies corresponding level of response. If a worker is already inside a fatal zone, ‘Action’ is issued to 
urge for immediate handling. If he/she is predicted to enter a fatal zone in 3 or 5 seconds, the responses become 
‘Alarm’ and ‘Alert’ respectively which are less severe. Such a multi-level warning mechanism enables more 
flexible and predictive safety assessment. 


Table 1: Proposed three-level mechanism of lifting safety assessment 


Response Scenario 


Action Worker inside fatal zone 
Alarm Worker will enter fatal zone in 3 seconds 
Alert Worker will enter fatal zone in 5 seconds 


To alert both the workers and the residential site safety personnel to the potential safety hazards, an instant warning 
system is developed with a series of if-else loops, and connected with an external chatbot API. When the model 
detects the worker’s tendency to enter the defined lifting zone, warning messages are issued to inform safety 
officers of the incident detected. The corresponding frames is also captured, with descriptive text about the unsafe 
scenario, and sent to registered stakeholders via an instant messaging platform (e.g. Telegram) for remedial actions. 


4. EXPERIMENT 


4.1 Experimental setup 


To prepare a rich dataset for validation, CCTV videos taken in different angles were collected, including those 
taken by at-grade cameras and mounted-on-crane cameras. Table 2 summarizes the attributes and sources of the 
videos solicited, which can be referred to in future studies for tower crane safety monitoring. 


Table 2: Statistics of the image dataset collected for model evaluation 


Angle Length Types Sources 
At-grade 2 min 50sec PPE wearing https://youtu.be/zmVjnWEX 5c 


At-grade 24 min 36 sec Crane operations _https://youtu.be/AgSyV8qZKMQ 
At-grade 15 min 56 sec Worker behavior https://youtu.be/3AbhT6TLf60 
Top-down 3 min Crane operations https://youtu.be/IlaEJgq0aEw 
Top-down 4min18sec Crane operations _https://youtu.be/Vg6SOcPviDs 
Top-down 4min 35sec Crane operations _https://youtu.be/IrhQHX3r-pM 
Top-down _59 sec Crane operations __https://youtu.be/viBcyF2H_1A 


A total of 5575 images were generated by extracting frames out of the collected videos, with manual inspection 
and sampling of high-quality frames, i.e. capturing diverse details of worker / crane operations. A detailed statistics 
of the dataset generated is summarized in Table 3. 


Table 3: Statistics of the image dataset collected for model evaluation 


Set No. of images 
Training 4889 
Validation 458 
Testing 228 
Total 5575 


The collected data then underwent a series of augmentations to maximize the generalization capability of the 
model being trained. The types of pre-processing include image resizing, rotation by EXIF orientation values and 
grayscale conversion. The images were also augmented by horizontal and vertical flipping, hue and saturation 
adjustment. Afterwards, the dataset was split into training, validation and testing sets. The training set was fed 
into different variants of object detection models, including YOLOv8-Large, YOLOv8-Small and YOLOv8-Nano, 
which consist of different degrees of model complexity in terms of neural network architecture. 
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As summarized in Equations (2)-(4), the evaluation metrics of for object detection and classification include 
recall, precision and average precision (AP) score. Moreover, the accuracy of worker trajectory tracking is 
evaluated by multi-object tracking accuracy (MOTA), as defined in Equation (5), where [DSW denotes the 
frequency of identity switching among workers detected. In addition, the computational speed of the proposed 
method is also evaluated (in frame-per-second), which validates the practicality of our framework in real-time 
CCTV processing for construction site monitoring. 


TP 
Recall = —— 2 
ees = RBS EN (2) 
TP 
Precision = —— 3 
recision = 75 7 FP (3) 
TP +TN 
AP So ee ee 4 
score = TP +FP+TN +FN (4) 
FN + FP + IDSW 
MOTA = 1 —-——___——— (5) 
TP +TN 


4.2 Results and discussion 


Table 4 summarizes the AP scores of the object detection module of the proposed framework. Overall, a mean AP 
of 97.0% is achieved among all the three object classes, with 99.5% AP score for both the classes ‘hook’ and 
‘material’. A slightly lower AP score of 92.0% is obtained for ‘worker’, because of the significant variation of 
worker sizes in the images, which capture both top-down and at-grade angles from largely varying distances. 


Table 4: Evaluation results of object detection 


Class AP score 
Worker 92.0% 
Class-wise Crane hook 99.5% 
Lifting materials 99.5% 
Recall 98.0% 
Overall Precision 96.0% 
Mean AP 97.0% 


Table 5 summarizes the AP scores of the worker PPE classification module of the proposed framework. Overall, 
the mean AP improves from 96.9% to 99.5% when training the PPE classification module with both confirming 
and dis-confirming cases. Such an approach also boosts the class-wise AP scores, from 96.7% to 99.3% (‘helmet’) 
and from 97.0% to 99.5% (‘vest’). The effect of incorporating the dis-confirming cases is that the classification 
model has learnt more distinctive features of those PPE from the negative samples. For instance, by seeing 
ordinary cloths without vest, the model intrinsically learns better how a vest should look like and hence more 
accurately classifies whether a person is properly wearing a vest. 


Table 5: Evaluation results of worker PPE classification 


Case 1 — trained with Case 2 — trained with 
confirming classes only confirming & dis-confirming classes 


Helmet 96.7% 99.3% T 
0, 
Class-wise AP scores TA PT A 7 
No vest / 99.6% 
Recall 96.9% 98.8% T 
Overall scores Precision 97.1% 98.8% T 
Mean AP 96.9% 99.5% T 


Table 6 summarizes the MOTA (for worker tracking) and computational speeds when combining DeepSORT with 
different YOLOv8 variants. Regarding worker tracking accuracy, YOLOv8-Large outperforms the other two 
models with the highest MOTA of 90.1%, while having slower computational speed than the other two (2.7 frames 
per second). YOLOv8n-Nano shows the fastest inferencing (13.4 frames per second), while its MOTA is 81.8% 
which may be due to the increased chance of missing detections. Hence, YOLOv8-Small achieves the most 
balanced performance (85.2% MOTA and 7.9 frames per second). 
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Table 6: Evaluation results of worker trajectory tracking and overall computational speed 


Model variant MOTA Computational speed 
(frame-per-second) 
YOLOv8-Large+DeepSORT 90.1% 2.7 
YOLOv8-Small+DeepSORT 85.2% 7.9 
YOLOv8-NanotDeepSORT 81.8% 13.4 


Figure 6 illustrates the predictive warning mechanism of the proposed framework. The developed modules 
process a complete video and identifies that construction workers are working within the danger zone during 
lifting operations. By detecting the location of the lifting equipment and tracking the movement of individual 
construction workers, warning signals and recommended actions are dispatched via a Telegram chatbot upon 
identifying the unsafe scenarios. The spatial relationship among the equipment and workers is accurately 
established, which then informs on-site safety managers of the workers’ risk statuses, urging for immediate actions 
more timely. Hence, our proposed framework enables more predictive safety monitoring of crane operations. 
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Figure 6: Demonstration of predictive warning mechanism in video processing 


5. CONCLUSION AND FUTURE WORK 


This paper proposes a vision-based framework for predictive lifting safety monitoring, relieving the tedious and 
error-prone manual inspection on sites in traditional practices. By analyzing the spatial interaction among essential 
objects in lifting operations (e.g. predicted movement of workers, danger zone around hook and lifting materials) 
more predictive incident identification is enabled for timely on-site safety assessment. The competitive accuracy 
and computational efficiency demonstrated in this study validates the practicality of the proposed framework. 
Based on the experimental findings, two research directions are suggested for future research: (1) camera 
placement optimization in actual deployment, considering various factors such as view coverage, degree of 
object occlusion, view angle and distance (implying video quality and hence analytical accuracy), etc. Research 
effort may be devoted into quantifying these factors into optimization framework formulated for camera 
placement, including the number of cameras, their position and orientation, etc.; (2) multi-modal sensor 
integration, extending the vision-based methodology to analyze more worker behavior such as injury/fall 
detection, and possibly also incorporating other kinds of sensors such as temperature sensor for heat-stroke 
warning monitoring and proximity sensor for worker-equipment conflict. More comprehensive research in the 
future will contribute to forming a systematic approach for construction safety monitoring. 
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LOCALIZING AND VISUALIZING THE DEGREE OF PEOPLE 
CROWDING WITH AN OMNIDIRECTIONAL CAMERA BY 
DIFFERENT TIMES 


Tomu Muraoka, Satoshi Kubota & Yoshihiro Yasumuro 
Kansai University, Japan 


ABSTRACT: The Corona Disaster increased the demand for information on the degree of human crowding, as it 
was essential to balance avoiding restricting behavior and reducing the risk of crowding. Although there are many 
technologies for detecting people using monitoring cameras, the number of cameras installed in a wide area is 
costly, and coverage is limited. In this study, we propose a method to qualitatively visualize the distribution of 
people by using images captured by a moving omnidirectional camera from the viewpoint of facility management 
during regular security patrols. Omnidirectional images are used for both 3D modeling of the target space based 
on SfM (structure from motion) and person detection/tracking by machine learning. The distribution of people is 
visualized qualitatively by obtaining the positions of the extracted people on the 3D model of the site and mapping 
them. The parallel software processing of visitor observation and mapping is expected to be highly cost-effective 
in terms of implementation and operation. On the other hand, although there are time deviations in the mapping 
depending on the location, the visualization and the updated time show their usefulness in understanding the 
distribution of congestion. 


KEYWORDS: COVID-19, people's congestion, omnidirectional camera, SfM (Structure from Motion), machine- 
learning 


1. INTRODUCTION 
1.1 Research background 


COVID-19 infection was moved to category five infectious disease in Japan on May 8, 2023. The wearing of 
masks has been left to the discretion of individuals and businesses, and the condition is coming to an end. During 
the coronavirus outbreak, infection control measures such as wearing masks, hand sanitizers, and refraining from 
going to places with a high risk of becoming infected became widespread at the individual level and effectively 
controlled other infectious diseases. However, with the relaxation of waterfront measures and the increase in the 
number of foreign tourists, there is a risk that other contagious diseases may be brought into Japan, and the number 
of older people at high risk of serious illness is expected to increase due to the aging of society. As shown in Fig. 
1, the number of influenza cases reported from medical institutions (fixed points) nationwide in 2023 showed an 
increasing trend, with many weeks exceeding the average number of cases reported over the previous five years 
(National Institute of Infectious Diseases, 2023). 
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Fig. 1: Number of Influenza Infections 
(National Institute of Infectious Diseases, 2023, partially modified and translated) 
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Therefore, in the after-coronas, it is necessary to continue infection control measures for individuals such as the 
older generations and people with underlying medical conditions who are at high risk of serious illness. 


On the other hand, consumers are restricted in their purchasing behavior by spatial interference and competition 
among customers in a retail environment with high customer density within a store. As a result, it has been 
confirmed that consumers' purchasing decisions are negatively affected, resulting in lower satisfaction (Eroglu et 
al., 2005). Therefore, in terms of large-scale facility management, a system to ensure social distance is still essential 
in tourist attractions, commercial facilities, and other places where many people usually gather, and there is a high 
demand for being able to check the level of congestion in an environment where people tend to gather before 
visiting. 


1.2 Previous work 


An example of a familiar means for consumers to obtain local congestion information in advance is the 
"Congestion Radar" published on the web by Yahoo! (Yahoo! JAPAN,2023). Fig. 2 shows an example of the 
"Congestion Radar" display of congestion in the vicinity of Tokyo Station (Japan). The "Congestion Radar" 
visualizes the degree of congestion on a heat-map-like color-coded map of Yahoo! Japan by generating statistics 
of user location information based on the usage of applications provided by Yahoo! As shown in Fig. 3, 
EXPOCITY (Japan), a large-scale commercial facility, uses existing technology to visualize the traffic volume 
aggregated by sensors that detect intrusion and passage in a color-coded format, allowing users to view parking 
lot congestion on the official homepage. These are all abstract and have the disadvantage that it is difficult to 
confirm the state of each part of a commercial facility and the distribution of individual people, making it difficult 
to visualize the state of congestion. In addition, many technologies detect and visualize people using monitor 
cameras inside facilities (Hitachi, Ltd, 2020, for example). However, since it is necessary to install multiple 
cameras in a wide area to check and compare the field of view individually, the operation of existing monitor 
cameras is costly in terms of the number of cameras installed, and there are also limitations in terms of the stability 
and comprehensiveness of the images. As a study to visualize human distribution from monocular wide-field 
images, the authors have conducted 3D modeling of the target space based on SfM (Structure from Motion) and 
human detection and tracking processing by machine learning from images captured by a 360 deg camera and 
mapped them onto a 3D model of the site and demonstrated its effectiveness in principle (Muraoka et al., 2022). 
However, this method has drawbacks in systemization regarding scalability and continuous operation since the 3D 
data of each site is regenerated each time it is photographed to update the information on the distribution of people. 
In addition, because the people's distribution is visualized on the 3D model of the site, it was not easy to grab the 
surrounding location relationships and thus limits readability for users as a map. 
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Fig. 2: Example of congestion radar on a map 
(Yahoo! JAPAN , 2023, partially modified) 
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Fig. 3: Example showing parking congestion 
(EXPOCITY,2023, partially modified) 


2. PROPOSED METHOD 
2.1 Outline of the proposed method 


To solve the above problem, this study proposes a method of displaying and updating congestion information on 
a floor map, as shown in Fig. 4, by moving around while taking pictures with an omnidirectional camera carried 
by security guards or regular base patrol officers in large commercial premises where people move in and out 
relatively frequently. Fig. 5 shows the processing steps of the proposed method. First, an omnidirectional camera 
captures images of the surroundings while moving around the target site. Using the photos from the video frames 
as input for the SfM process, which performs 3D reconstruction of the target scene, and a 3D coordinate system is 
constructed. As shown in Fig. 6, the captured video is processed for large-scale sites by dividing it into multiple 
areas. SfM is performed for each area to generate a 3D reconstruction and coordinate system. The video is zenith- 
corrected so that the vertical axis direction of the image is aligned with the vertical direction of the SfM model. 
Next, a coordinate transformation equation from the SfM model coordinates to the image coordinates of the floor 
map image of the facility is calculated by solving a point set matching problem, and the positional relationship 
between the SfM model and floor map for each patrol position is mapped as shown in Fig. 7. The above process 
is the preliminary processing performed only at the beginning. 


In the person placement process, the same zenith correction processing as above is performed on video frames 
taken by security guards and others during their patrol duties, and the video data is stored in the cloud service. The 
video frames are used as input for person detection and tracking. Fig. 8 shows a person detection and tracking 
process for a video image V taken at a particular location. A specific detector detects multiple targets to be tracked 
in each video frame, and the same ID is assigned to the same target tracked from frame to frame. In a group of 
images obtained at regular intervals from video frames (Frame t in Fig. 8), the information on the same person in 
the frames is integrated to determine whether the object detected as a person in each image is new. If the person is 
newly detected, the image in which the person is detected is added to the set of input images for SfM. When SfM 
is executed again, the coordinate system that has already been generated is maintained, and the information of the 
added image is processed incrementally to calculate the coordinates of the shooting position of the new image. 
Then, the coordinates of the feet of the new person in the image are obtained, and the person's position in the SfM 
model is calculated based on the relationship with the camera coordinates. In this way, the distribution of persons 
can be additionally and successively updated in a consistent coordinate system. 


The position of a person on the floor map is calculated by transforming the coordinates from the previously 
calculated SfM coordinates to the floor map image, and a symbol of the person's size is placed. The symbols on 
the floor map are then deleted each time data captured by the omnidirectional camera is input, and the map 
displaying the distribution of people is updated at each time based on the input timestamps. Even over a wide area, 
the map can be updated piecemeal, corresponding to the location relation on the floor map. SfM is performed on 
the video frames captured in each area. In this way, the system patrols and displays the positions of persons on the 
floor map, providing the user visualization of congestion in advance. 


659 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


30m 


a) 


Fig. 4: An example of a floor map (Kansai University in japan) 
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Fig. 6: Example patrol routes at the target site 
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Fig. 7: Mapping examples of SfM model coordinates to the floor map coordinate 


V-Frame t-1 V-Framet V-Frame t+1 


Fig. 8: Human detection and tracking through consecutive frames 


2.2 Identification of the human's position 


In identifying the standing position of a human on the floor map, the first step is to obtain the pixel positions (a, b) 
of the feet of the person in the omnidirectional image with width W px and height H px, as shown in Fig. 9. Since 
the height direction of the omnidirectional image corresponds to the Y, axis of the camera coordinates through the 
zenith correction process, as shown in Fig. 10, let @, be the angle between the vertically downward direction of 
the omnidirectional camera and the line-of-sight direction through the human's feet, and 0, be the azimuth angle 
of the person's feet based on the Z axis of the camera coordinates, 0; = ma/H and 0, = 2mb/W and The 
following is a calculation of the azimuth angle between the ground and the origin of the camera coordinates. Next, 
the height h from the ground to the origin of the camera coordinates is the difference between the Y,, coordinates 
of each camera and the Y, coordinates of the point cloud of the ground detected by plane estimation using 
RANSAC (M. A. Fischler et al.,1981) on the point cloud data obtained by the 3D reconstruction process of the 
target site. In the X,-Z, plane of the camera coordinate system, if the distance from the origin to the person is d, 
the position of the human (X, Z) projected onto the X,-Z, plane is obtained by the following equation. 
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d = htan 0; (1) 
X = d sin 0, (2) 
Z =d cos 0, (3) 
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Fig. 9: Position of a person in an omnidirectional image 
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Fig. 10: Position of the camera and the person 


Fig. 11 shows the relationship between the SfM model's coordinate system and each camera's coordinate system 
for each video record. The position of the camera coordinate system is the coordinate system calculated by SfM, 
which is also the position of the patrolman. Since the Y, axis of the camera coordinate system and the Y,, axis of 
the SfM model correspond to the zenith direction, the positions of the persons detected at each shooting point can 
be integrated into the same coordinate system by rotation and translation on the Xw - Zw plane. Let R(@) be the 
rotation and ( t,, tz) be the camera position relative to the world coordinate; the position of a person (Xy , Zw) in 
the 3D model can be calculated from the local camera coordinate (X ,Z) by equation (4). 


P| =r@ k] [2] (a) 
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Next, the coordinate system X,,-Y,, plane of the floor map image of the facility and the Xw-Zw plane of the SfM 
model coordinate system can be transformed into the position of the human on the SfM model by rotation pọ 
around the Y axis in the SfM model coordinate system and translation shift, as before. Let S be a scaling matrix, 
R(@,,) be a rotation matrix, and the origin coordinates (mx, my) of the SfM model coordinate system in the Xm- 
Zm plane and the human's position (Xm, Ym) can be calculated as follows. 


Ly] = s rcon |" | + [me (5) 
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Fig. 11: Relationship between the SfM model coordinate Xy-Yw-Zy and the coordinates of the moving camera 


3. EXPERIMENT 
3.1 Equipment 


For the experimental environment, we used a location in front of a convenience store on the Senri-yama 
campus of Kansai University (area (i) in Fig. 6), assuming a commercial facility crowded with many people. 
Theta X (RICHO) was used for the omnidirectional camera, with a video resolution of 5760 x 2880, Metashape 
Professional (Agisoft) for SfM, YOLO-X (Z. Ge et al.,2021) was used for the human detection model, and 
motpy (motpy - simple multi-object tracking library, 2022) was used for human tracking, as it is easy to 
combine various object detection models and OpenCV, an image processing library, was used to display the 
location of the person on the floor map of the target site. 


3.2 Construction of coordinate system 


As shown in Fig. 12, a 3D reconstruction based on SfM was performed using 91 images taken at the site, and 
a coordinate system was constructed. The average error was approximately 0.05 m. The number of tie points 
was approximately 4.7. The number of tie points was about 47,000, and the RMSE (root mean square error) 
of the re-projection of the feature points estimated from the image set onto the original image was about 3.0 
pixels, indicating that the 3D shape is generally accurate. In the mapping between the coordinate system of 
the SfM model and that of the floor map, first, we visually picked up the corners of buildings and roads in 
each coordinate system to prepare five pairs of coordinates of ( X,, , Zw) and (Xm, Ym). Next, the distance from 
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the center of gravity of the five points to each of the five points in each coordinate was calculated, and the 
average value was calculated. There is a method to calculate the rotation matrix and translation vector in the 
point set matching problem by SVD (singular value decomposition) (K.S.Arum et al.,1987). In this study, 
R(@,,) and (mx my) are calculated by SVD. 5 points on the SfM model are transformed to positions on the 
floor map by equation (5). The results displayed on the floor map are shown in Fig. 13. The exchanged 
positions of the five points on the floor map generally corresponded to those of the five points on the SfM 
model. 
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Fig. 13: Correspondence between SfM coordinate system and floor map coordinate system: 
blue dots indicate 5 points in each coordinate system 
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3.3 Integration and display of person location information 


YOLO-X was used for performing human detection for obtained image set from each section's video, as 
shown in Fig. 14. If the detected person has been tracked from a previous image frame by motpy using the 
video as input, the detected human is integrated into the existing person's information. If the human was not 
tracked and was newly observed, the observed image was used to estimate the camera position in the 
coordinate system of the SfM model, and the camera coordinates were calculated. The coordinates of the 
human's feet and the coordinates of the floor map were used to calculate the position coordinates of the 
human using Equations (1)-(5) and displayed on the floor map (Fig. 15, 16, and 17). The positions of the 
people mapped on the floor generally corresponded to the original images, and their positions about the 
buildings were also confirmed. However, because YOLO-X can detect people even in the case of body parts, 
and because the images are taken while moving, there are many cases where people in occlusion due to 
occlusion can be observed and mapped in other locations, making it possible to visualize the distribution of 
the people quantitatively. In addition, we were able to confirm changes in the degree of crowding at different 
times, such as around noon (Fig. 15), when the area is crowded at lunchtime, there is extreme crowding near 
the store entrance, many people move during class breaks (Fig. 16), and the area is empty during class 
(Figure17). Area (ii) in Fig. 5 was also patrolled around noon on the same day. The distribution of people 
observed was displayed on the floor map in the same manner as above (Fig. 18). As a result, it was possible 
to visualize the occurrence and resolution of queues and crowding of people near building entrances and the 
vicinity of stores, depending on the time of day. In a wide area, it was confirmed that information on the 
distribution of people could be updated for different areas in parallel by patrolling the area with several people. 
These functions are practical for visitors and others to know the trend of human distribution. 


Fig. 14: Human detection using YOLOX for an omnidirectional image 


4. CONCLUSION 


In this study, we proposed a method for mapping and updating the distribution of people on a floor map in a 
fragmented manner, even over a wide area, simply by walking around the site and taking pictures with an 
omnidirectional camera and confirmed the method's effectiveness through experiments. The proposed method 
cannot perform synchronized observations because both the observer and the visitor are moving. However, it 
requires far fewer cameras and is equivalent to observing from many viewpoints because of the moving point 
observation. Therefore, parallel software processing of visitor observation and mapping will be highly cost- 
effective in terms of implementation and operation. Although the visitor mapping has time deviations depending 
on the location of the observation, visualization of the updated time together with the map will help understand 
the distribution of congestion. In future work, the authors plan to investigate a method of real-time mapping by 
online processing using the live streaming function of the camera and to construct a system that allows users to 
view maps showing the distribution of people on the Web. 
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Fig. 15: Placement of people on the area (1) map: 12:00 PM: 
Red dots indicate the visitor's location. 


Fig. 16: Placement of people on the area (i) map: 2:30 PM: 
Red dots indicate the visitor's location. 
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Fig. 17: Placement of people on the area (i) map: 4:00 PM: 
Red dots indicate the visitor's location. 


Fig. 18: Placement of people on the area (ii) map: 12:00 PM : 
Red dots indicate the visitor's location. 
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Park 
ConTI Lab, Department of Architectural Engineering, Chung-Ang University, Seoul, South Korea. 


ABSTRACT: The construction industry faces significant challenges, including a high prevalence of occupational 
incidents, often involving fires, explosions, and burn-related accidents due to worker non-compliance with safety 
protocols. Adherence to safety guidelines and proper utilization of safety equipment are critical to preventing such 
incidents and safeguarding workers in hazardous work environments. Consequently, a monitoring system tailored 
for construction safety during welding operations becomes imperative to mitigate the risk of fire accidents. This 
paper conducts a brief analysis of OSHA rules pertaining to welding work and introduces the iSafe Welding system, 
an advanced real-time safety monitoring and compliance enforcement solution designed specifically for 
construction site welding operations. Harnessing the real-time object detection algorithm YOLOv7 in conjunction 
with rule-based scene classification, the system excels in identifying potential safety violations. Rigorous 
evaluation, encompassing precision, recall, mean Average Precision (mAP), accuracy, and the F'1-Score, sheds 
light on its strengths and areas for improvement. The system showcases robust performance in rule-based scene 
classification, achieving high accuracy, precision, and recall rates. Notably, the iSafe Welding system demonstrates 
a formidable potential for enhancing construction site safety and regulatory compliance. Ongoing enhancements, 
including dataset expansion and model refinement, underscore its commitment to real-world deployment and its 
strength in ensuring worker safety. 


KEYWORDS: Safety monitoring, scene classification, welding work, fire prevention, construction safety, OSHA 
rules compliance 


1. INTRODUCTION 


The construction industry exhibits a pronounced prevalence of both fatal and non-fatal occupational incidents on 
a global scale (Hussain et al., 2022; Khan et al., 2023). Among the significant contributors to these casualties and 
injuries are fires, explosions, and burn-related incidents, often stemming from workers’ non-adherence to 
precautionary safety protocols during hot work. The primary causes of fires at construction sites encompass highly 
flammable substances, including foam insulation, gas cylinders, chemical storage facilities, and oil-based paints. 
The proximity of these materials to welding and cutting activities can serve as a precipitating factor for fires at 
construction sites (Xu et al., 2022). According to the Occupational Safety and Health Administration (OSHA) 
accident database, a total of 80 accidents occurred between July 2019 and July 2023 related to burning incidents 
at construction sites attributed to welding and cutting work. Among these recorded incidents, 22 were categorized 
as fatalities. These accidents occur when workers fail to utilize safety gear properly or neglect its use altogether, 
leading to exposure to various hazards such as chemical splashes, flying debris, intense light, harmful fumes, and 
more (Nill, 2019). For instance, during welding or cutting operations, the absence of proper eye protection can 
result in severe eye injuries, potentially leading to temporary or permanent vision loss. Additionally, if a worker 
does not take necessary precautions while performing welding or cutting tasks, and flammable materials or 
chemicals are stored nearby, there is a significant risk of explosions or fires at the workspace, posing serious harm 
to the worker. It is essential for workers to adhere to safety guidelines and utilize the appropriate safety equipment 
to prevent such incidents and safeguard their well-being in hazardous work environments. 


Nowadays, computer vision (CV) techniques have found applications in monitoring construction sites across 
various construction scenarios (Jeong et al., 2017). However, with regard to welding processes, researchers have 
focused on areas such as identifying welding defects (Ramadan et al., 2023; Wu et al., 2023), welding bead 
detection (JOHN, 2023), detecting welding quality (Yang et al., 2018), and classifying welding types (S. Chen et 
al., 2023; H. Liu et al., 2023). Chen et al. proposed YOLOVS based welding helmet use detection during the 
welding work (W. Chen et al., 2023). However, it is worth noting that there have been relatively limited efforts 
directed towards ensuring compliance with safety regulations or implementing monitoring mechanisms 
specifically during welding operations. Hence, the need for a comprehensive monitoring system in construction 
safety during welding operations is evident, primarily to mitigate the risk of fire accidents. To enhance safety at 
the construction site, this paper briefly analyzes OSHA rules related to welding work. Further, a computer vision- 
based monitoring system, “iSafe Welding System,” is proposed to ensure compliance with safety protocols during 
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welding work. The Convolution Neural Network (CNN)-based You Only Look Once (YOLO) version 7 model is 
trained for the object detection module, and a rule-based algorithm is developed to assess safety rules compliance 
that classify the scene as “safe” or “unsafe”. Moreover, a new dataset is collected from construction jobsite, web 
image scrapping, and generated synthetic data from OpenAI’s DALL.E.2. 


2. SAFETY RULE ANALYSIS 


The OSHA rule 1910.252 pertains to hot work, encompassing welding, cutting, and brazing activities. This 
regulatory framework is forked into distinct sections: section (a) addresses fire prevention and protection, while 
section (b) is devoted to personnel protection, encompassing guidelines pertaining to personal protective 
equipment (PPE) usage and the safe positioning of welding cables and equipment. A careful examination of these 
regulations reveals that, during the execution of welding operations, workers are mandated to use welding-specific 
PPE. Furthermore, provisions necessitate the presence of a fire extinguisher in close proximity, the employment 
of fire prevention measures such as the utilization of fire prevention nets, a prohibition on the presence of insulation 
or foam in the safe area, and the maintenance of a safe distance from chemicals and gas cylinders during welding 
activities. For a comprehensive exposition of OSHA rule 1910.252 relevant to welding, along with an outlining of 
requisite detection objects, readers are directed to Table 1 for further interpretation. 


Table 1. The details of OSHA rule 1910.252 related to welding and objects require for detection 


Sr. Rule Code Description Objects 
No. 
a) Fire prevention and protection 

1 (a)(1)(ii) Use guards to confine heat, sparks, and slag Fire 

2 (a)(2)(i) Combustible Material - Prevent exposure to sparks through floor Prevention 
openings, cracks, walls, doorways, and windows Net 

3 (a)(2)(ii) Fire Extinguishers - Maintain ready-to-use fire extinguishing Fire 
equipment Extinguisher 

4 (a)(2)(iii)(A)(1) Combustible material in building construction or contents closer than 
35 feet (10.7 m) to the point of operation. 

5 (a)(2)(4ii)(A)(2) Appreciable combustibles are more than 35 feet (10.7 m) away but are 

arenes Flammable 
easily ignited by sparks. : 
Material 

6 (a)(2)(11)(A)(3) Wall or floor openings within a 35-foot (10.7 m) radius expose 
combustible material in adjacent areas including concealed spaces. 

7 (a)(2)(i11)(A)(4) Combustible materials are adjacent to the opposite side of metal 
partitions, walls, ceilings, or roofs and are likely to ignite. 

b) Protection of personnel 

8 (b)(1) an Protection of Personnel - Clear placement of welding cable and Welding 
equipment machine 

9 (b)(2)(i)(A-D) Eye Protection - Helmets or hand shields for arc welding and cutting 

10 = (b)(2)(@)(B) Eye Protection - Goggles or suitable eye protection for gas welding or Worker and 
oxygen cutting Helmet with 

11 (b)(2)G)\(C) Eye Protection - Transparent face shields or goggles for resistance Ye shield 
welding or resistance brazing (PPE) 

12 (b)(2)@)(D) Eye Protection - Suitable goggles as needed for brazing operations 
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SECTION C - Al, DATA SCIENCE AND ANALYTICS 


3. METHODOLOGY 


This section describes the comprehensive methodology employed in the development and implementation of the 
“iSafe Welding system”, which comprises three main steps: dataset collection and preparation, training object 
detection model, and safety rules compliance. The details of these steps are as follows: 


Dataset Collection and Preparation | Object Detection Rule Compliance/Scene Classification 


detected objects = get_detected_objects{) 
if "FireExtinguisher” Is not In detected objects: 
Set SafetyStatus to "Unsafe" 
if "FlammableMaterial” Is in detected objects: 
Set SafetyStatus to “Unsafe” 
if "Worker" is detected: 
for each detected worker: 
if Worker does not have PPE: 
Set SafetyStatus to “Unsafe” 
if SafetyStatus is “Safe”: 
Set sceneStatus to “Safe” 
else: 
Set sceneStatus to “Unsafe” 
Output the safety assessment: “sceneStatus” 
if sceneStatus Is "Unsafe": 
generate an alert for immediste action. 


Data Collection 


Web 
Scrapping 


| 


Data Preparation 


| Job Site 


DALL.E.2 | 


Dataset 
Annotation 


duplicate 


images 


i Remove | 
Augmentation 


| Dataset 


Detected Objects 


Fig. 1. Methodology for training iSafe Welding System 


3.1 Dataset Collection and Preparation 


The dataset employed in this research comprises 633 meticulously curated images collected from construction 
jobsite, web image scrapping, and generated synthetic data from OpenAI’s DALL.E.2. The dataset was divided 
into training (511), validation (61), and testing (61) sets for the iSafe Welding system. This dataset includes 1,935 
annotated instances, categorized into five classes: “Welding Equipment” (621 instances), “Worker with PPE” (607 
instances), “Welding Machine” (348 instances), “Fire Extinguisher” (197 instances), and “Flammable Material” 
(162 instances). These categories represent essential elements and scenarios within welding environments, 
emphasizing equipment, worker safety, welding machinery, fire prevention measures, and flammable material 
management. The roboflow platform is used for annotating dataset by drawing bounding boxes. For data 
augmentation, a series of transformative techniques are employed exclusively on the training dataset using 
roboflow. These augmentation methods included horizontal flip, shear, hue adjustment, brightness variation, 
exposure modification, cutout, and the mosaic technique. The training set was increased 3 times of original training 
set after augmentation. 


3.2 Training Object Detection Model 


In pursuit of real-time object detection capabilities, iSafe Welding has strategically adopted single-stage detection 
algorithms. These algorithms are renowned for their efficiency and speed, offering high frame-per-second (fps) 
rates for rapid and real-time object detection (Diwan et al., 2023). The decision to favor single-stage detectors over 
two-stage detectors aligns with the project's primary objective of ensuring swift and accurate detection of objects 
during welding operations within the construction industry. In the realm of real-time object detection algorithms, 
two prominent options, Single Shot Multibox Detector (SSD) (W. Liu et al., 2016) and the YOLO series detectors, 
were considered for this research work. After careful evaluation, the YOLO series detectors were selected as the 
preferred choice due to their notable strengths in achieving a commendable balance between accuracy, as measured 
by mean Average Precision (mAP), and fps rates. Specifically, YOLOv7 (Wang et al., 2023) was chosen as the 
model of preference for training, enhancing its capabilities to excel in the task of object detection, a critical 
component of the iSafe Welding system aimed at enhancing safety and compliance during welding operations in 
the construction industry. 


YOLOv7 is a single-stage anchor-based object detector that uses a custom backbone network and a new head 
network. The basic YOLO model architecture (Long et al., 2020) is shown in the object detection module of Fig. 
1. The backbone is a convolutional neural network (CNN) that extracts features from the input image. YOLOv7 
uses a modified ELAN architecture for the backbone, which is more efficient and has better learning ability than 
the original ELAN architecture. The head is responsible for predicting bounding boxes and object classes. 
YOLOv7 uses a single-stage head, which means that it predicts bounding boxes and object classes directly from 
the features extracted by the backbone and neck. YOLOv7 uses a new anchor box selection algorithm that is more 
efficient and effective than the algorithm used in previous YOLO models. This helps to improve the accuracy of 
the model, especially for small objects. YOLOv7 has a reduced parameter count and computation compared to 
previous YOLO models. This makes it faster and more efficient to run on devices with limited resources (Wang et 
al., 2023). 
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The training process of the model was conducted using hardware resources consisting of an Intel Core i9-10900 
CPU, operating at 2.80GHz, complemented by 32 GB of RAM, and further accelerated by the inclusion of an RTX 
3090 graphics processing unit, boasting 24 GB of dedicated memory. The model was trained on 300 epochs, with 
a batch size parameter set to 16, and an input image size established at 640x640 pixels. In pursuit of model 
optimization, the YOLOv7 framework uses the default Stochastic Gradient Descent (SGD) optimizer, initialized 
with a learning rate of 0.01, culminating in a final learning rate of 0.1. Additional optimization parameters 
encompassed a weight decay factor of 0.0005 and a momentum coefficient of 0.9, collectively contributing to the 
refinement of the model's performance. 


3.3 Safety Rules Compliance/Rule-based Scene Classification 


After conducting an in-depth examination of the OSHA regulation regarding welding work, as detailed in Section 
2, significant and crucial findings were obtained. It was determined that the presence of specific safety parameters 
significantly impacts the safety status of a welding scenario. Specifically, if a welding task is being executed and 
no fire extinguisher is positioned in proximity, or if flammable materials are detected in the surrounding area, the 
scene is deemed unsafe as per OSHA guidelines. Furthermore, any detection results indicating the absence of PPE 
on a worker subsequently flag the scene as unsafe. Upon completion of the object detection module, the results 
are seamlessly transitioned to the safety rules compliance module. Herein, a algorithm is developed to examine 
the adherence to safety rules, discerning whether the detected scenario should be categorized as safe or unsafe, 
effectively enhancing safety and compliance during welding operations within the construction industry. The 
algorithm is described in Algorithm 1. 


Algorithm 1: Safety Rules Compliance Module 


Input: Results from the object detection module, including detected objects and their attributes. 
Output: Safety assessment for the detected welding scenario, categorized as "Safe" or "Unsafe." 


1. Initialize a variable SafetyStatus to "Safe" 


2. if "FireExtinguisher" is not in detected objects: // Check for the presence of a fire extinguisher 
Set SafetyStatus to "Unsafe" 

3. if "FlammableMaterial" is detected: // Check for the presence of flammable materials 
Set SafetyStatus to "Unsafe" 

4. if "Worker" is detected: // Check for worker PPE 


for each detected worker: 
if Worker does not have PPE: 
Set SafetyStatus to "Unsafe" 


5. if SafetyStatus is "Safe": // Perform a final safety assessment 
Set sceneStatus to "Safe" 
else: 
Set sceneStatus to "Unsafe" 
6. Output the safety assessment: "Safety Assessment: sceneStatus" 


N 


. if sceneStatus is "Unsafe": 


Log safety assessment results and generate an alert for immediate action. 


4. EVALUATION AND DISCUSSION 


The evaluation of the iSafe Welding system encompassed two key aspects: object detection performance and rule- 
based scene classification. These assessments aimed to validate the system's efficacy in real-time safety monitoring 
and compliance enforcement within construction site welding operations. 


4.1 Object Detection Performance 


In the context of object detection, essential metrics such as precision, recall, and mean Average Precision (mAP) 
were rigorously calculated to gauge the system's ability to accurately identify and localize objects of interest. 
Concurrently, accuracy and the F1-Score were employed for scene classification as "safe" or "unsafe." The results 
are presented in Fig. 2 and Table 2. A significant observation is the disparity in mAP scores between the validation 
and test sets. The validation set exhibited a notably higher mAP, suggesting a degree of overfitting to the training 
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ON C - Al, DATA SCIENCE AND ANALYTICS 


and validation datasets. This phenomenon highlights the need for model refinement to enhance generalization 
capabilities and ensure reliable performance in real-world scenarios. The class-specific analysis reveals that the 
"flammable materials" class achieved a perfect recall of 100%. This remarkable recall rate can be attributed to the 
dataset's inherent limitation, featuring only a single type of flammable material. Conversely, the "welding 
equipment" class exhibited a 67.4% mAP on the test set, primarily due to the small and slender nature of these 
objects, rendering them challenging to detect accurately. 


Table 2. Result of object detection model on validation and testing set with confidence threshold 0.5 


Dataset Precision % Recall % mAP@0.5 % 
Validation set 97.9 97.9 99.2 
Test Set 80.5 92.6 88.5 


BE. Nar £ P 
Fig. 2. Object detection and scene classification results of iSafe Welding System 


4.2 Rule-Based Scene Classification 


The rule-based scene classification component of the iSafe Welding system demonstrated its effectiveness in 
categorizing scenes as "safe" or "unsafe." The testing dataset was thoughtfully divided into these two categories, 
yielding 14 images classified as "safe" and 47 as "unsafe." Applying Algorithm 1 to these scenes led to notable 
results. The achieved accuracy, precision, recall, and Fl-score of 96.72%, 97.87%, 97.87%, and 97.8%, 
respectively, underscore the algorithm's proficiency in accurately classifying scenes based on safety criteria. These 
outcomes affirm the algorithm's potential to enhance safety compliance and enforcement in the context of welding 
operations. 


4.3 Future Directions 


To address the observed issue of overfitting and to further enhance the system's capabilities, future efforts will 
primarily focus on expanding the dataset. Additionally, while the dataset used in this study inherently ensures that 
all depicted workers are equipped with PPE, future dataset extensions will include instances of workers without 
PPE. Furthermore, this expansion will involve the inclusion of various flammable materials and augmenting the 
dataset with a more diverse set of welding scenarios set in varied environmental contexts. Such measures are 
anticipated to significantly enhance the model's generalization and real-world applicability. As the OSHA rules 
require proximity between welding equipment and flammable materials as shown in Section 2, the future work 
will utilize real-sense camera to find distance between objects to calculate safe distance. Further, the updated 
algorithm will classify scene as safe or unsafe based on the safe distance. 


5. CONCLUSION 


Computer vision techniques have emerged as valuable tools for enhancing safety and efficiency on construction 
sites. While previous research has concentrated on aspects such as defect identification, welding bead detection, 
quality assessment, and type classification, there has been a noticeable gap in addressing safety compliance during 
welding operations. This gap underscores the need for a comprehensive monitoring system to mitigate the risk of 
fire accidents and ensure compliance with safety regulations. This paper has provided a brief analysis of OSHA 
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tules related to welding work and introduced the iSafe Welding System, a computer vision-based monitoring 
solution designed to uphold safety protocols during welding operations. The integration of the YOLOv7 model for 
object detection, along with a rule-based algorithm for safety rule assessment, represents a robust approach to 
classifying scenes as safe or unsafe. Furthermore, the creation of a new dataset, comprising data from construction 
job sites, web image scraping, and synthetic data generated using OpenAI's DALL.E.2, enhances the system's 
adaptability and accuracy. The evaluation results demonstrate the system's effectiveness in real-time safety 
monitoring and compliance enforcement within construction site welding operations, with high precision and recall 
rates. The iSafe Welding System offers the potential to significantly improve workplace safety in the construction 
industry. By addressing the critical need for safety monitoring during welding, this system contributes to a safer 
construction environment, reducing the potential for accidents and improving overall workplace safety. Its 
applications extend to various industries requiring safety compliance and object detection, making it a valuable 
asset for enhancing safety and efficiency in dynamic work environments. 
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ABSTRACT: 


Efficient forklift operation is critical for construction site safety and project progress; yet, the construction industry 
deals with recurrent issues, including unauthorized forklift operation, operator drowsiness, visibility challenges, 
blind spots, and load placement errors. This paper introduces the "iSafe ForkLift," a comprehensive safety 
framework powered by computer vision, specifically designed to tackle these multifaceted safety challenges 
associated with forklift operations. The framework provides an array of integrated solutions, encompassing facial 
recognition for authorization, anomaly detection for behavior monitoring, stereo cameras for improved visibility, 
blind spot solutions, and load placement monitoring. Aligned with OSHA safety standards, it offers opportunities 
for enhanced forklift safety by addressing a broad spectrum of potential risks within a single, efficient framework. 
Systematically addressing multiple safety risks within this unified framework significantly elevates overall safety. 
Future studies should prioritize enhancing technology by merging computer vision with IoT to boost precision and 
safety, especially on challenging terrains, thereby elevating construction industry standards' reliability. 


KEYWORDS: Forklift operations, Computer vision, Safety framework, Operator drowsiness, Visibility challenges, 
OSHA standards, Regulatory compliance 


1. INTRODUCTION 


The construction industry, characterized by its dynamic and complex nature, is a realm where innovation and risk 
coexist. Here, heavy earth-moving machines, including forklifts, are integral, performing tasks that are pivotal for 
project progression. Forklifts are indispensable tools in this regard, capable of swiftly transporting heavy loads to 
various locations within the site. However, the operation of such machinery is filled with risks. Accidents involving 
heavy earth-moving machines are not uncommon and have led to severe, sometimes fatal, consequences, 
underscoring the critical need for enhanced safety protocols. Approximately 75 to 100 workers lose their lives 
in forklift accidents annually, with an average of roughly 87 fatalities per year. This number has seen a nearly 30% 
increase over the past decade (Forklift Accident Statistics, n.d.). According to the OSHA database, 1117 accidents 
occurred just because of forklifts (OSHA, 2023a). Operating forklifts on construction sites poses various 
challenges and risks, from unauthorized personnel attempting to use them to blind spots and improper load 
placement. Notably, there has been a pressing need to propose a comprehensive solution for these challenges, yet 
no researcher has put forward an all-encompassing approach to address them concurrently. 


iSafe ForkLift, a state-of-the-art monitoring framework powered by computer vision, is meticulously proposed to 
tackle the multifaceted safety challenges associated with forklift operations. The suggested framework provides a 
range of solutions: (1) Authorization through Face Recognition: Leveraging a camera installed within the forklift, 
the system ensures that only authorized personnel can operate the machinery. By utilizing advanced face 
recognition algorithms, it verifies the identity of the operator in real-time, preventing unauthorized access. (2). 
Anomaly Detection for Driver Behavior: Beyond just authorization, the system is equipped to monitor the behavior 
of the operator. Through anomaly detection algorithms, it can identify signs of Drowsiness. Detecting signs such 
as frequent yawning, heavy eyelids, or nodding off, which can be particularly dangerous when operating heavy 
machinery, Distraction or Physical Discomfort, or other abnormal behaviors, prompting immediate intervention. 
(3). Enhanced Visibility with Stereo Cameras: Addressing the perennial issue of sight blocks, stereo cameras are 
installed to provide drivers with a hidden view. This feature not only enhances visibility but also offers real-time 
data on the distance between the forklift tip and nearby objects, aiding in precise navigation. (4). Blind Spot 
Solutions: Blind spots, a significant hazard in forklift operations, are mitigated through strategically placed 
signalers or mirrors. Computer vision techniques continuously monitor these areas, ensuring that they remain clear 
and alerting the driver to potential obstructions. (5). Load Placement Monitoring: The proper placement of loads 
on the forklift's tipover is crucial for stability. Using cameras and computer vision techniques, the system assesses 
the positioning of loads, ensuring they are securely and correctly placed. 
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2. CURRENT MONITORING TECHNIQUES FOR SAFE FORK OPERATIONS 


Forklifts encounter several safety challenges, particularly in dynamic construction environments where loading 
and unloading activities are prevalent. While forklifts find extensive utilization in construction, the risk of 
accidents in such contexts is significantly elevated compared to other operational settings. Presently, a range of 
monitoring techniques are employed to mitigate these safety concerns. While there isn't a specific study focused 
on checking the authorized forklift operators for driving, numerous authors have explored the issue of abnormal 
driver behavior. Much attention has been devoted to the development of methods for detecting anomalies in this 
context (Amin et al., 2023; Okan & Rigoll, n.d.). Blind spot problem has been addressed by different researcher, 
(Shete et al., 2021) The implementation of an ultrasonic sensor to detect nearby obstacles in a forklift is an effective 
safety measure. However, to address blind spots and obstacles approaching from corners, the integration of convex 
lenses at these specific locations becomes imperative. These convex lenses serve as additional visual aids, 
enhancing the forklift operator's field of vision and ensuring a more comprehensive obstacle detection system 
Moreover, established standards dictate that forklift operation is restricted solely to authorized personnel, 
representing a fundamental safety measure in this context. This paper investigates factors influencing forklift 
operator safety and efficiency, including energy usage, training, IoT integration, ergonomic considerations, and 
worker drowsiness (Mediavilla, 2023). Emphasizing the significance of the matter, previous research did not 
encompass critical aspects of forklift operations utilizing computer vision technology, such as driver authorization, 
detecting abnormal behavior, mitigating blind spots, and optimizing load placement. Nevertheless, OSHA has 
played a vital role in addressing these factors, recognizing that neglecting them can result in severe accidents. 
These considerations are pivotal in the context of safety rules analysis, a topic to be elaborated on in the following 
section. 


3. SAFETY RULES ANALYSIS 


OSHA has established several crucial safety standards for industrial truck operations to mitigate risks and ensure 
workplace safety (OSHA, 2023b). These standards include prohibiting unauthorized individuals from riding on 
industrial trucks (1910.178(m)(3) & 1910.178(1)(3)). This measure prevents potential accidents such as falls, 
entanglements, or collisions that could result from non-operators being on board. Additionally, operators are 
required to avoid driving industrial trucks toward individuals positioned in front of fixed objects (1910.178(m)(1)). 
This rule emphasizes the importance of operator awareness and vigilance to prevent collisions that could lead to 
severe injuries or fatalities. Operators are also expected to maintain a forward-facing orientation while operating 
industrial trucks (29 CFR 1910.178(n)(4) & 29 CFR 1910.178(n)(6)). This enhances visibility, reducing the risk 
of accidents, especially in areas with pedestrian traffic or confined spaces. Furthermore, proper load handling is 
emphasized through the standards that require loads to be secure and correctly positioned (29 CFR 1910.178(0)(1) 
& 29 CFR 1910.178(0)(2)). Ensuring load stability is vital in preventing accidents such as tip-overs, which can 
result in injuries, equipment damage, and hazardous material spills. Lastly, while not explicitly stated in the 
provided standards, it is crucial for drivers to remain attentive (Abnormal Behavior - General Rule). This means 
avoiding behaviors that could distract them or impair their ability to operate the vehicle safely, as such actions can 
lead to accidents and must be strictly prohibited. Table 1 outlines OSHA regulations, providing their associated 
particulars along with proposed solutions. 


Table 1 Safety Rules for Forklift 


Sr. OSHA Standards Description Case 7 
A Proposed Solutions 

No Scenario 

i 1910.178(m)(3) No unauthorized operator riding on | Unauthorized Face Recognition 

& 1910.178(1)(3) trucks Person $ 
Never drive trucks toward anyone Depth Estimation/Object 
2 1910.178 1 truck 
(my) in front of fixed objects Struck by Detection 
Blind 


29 CFR 1910.178(n)(4) 


3 29 CFR 1910.178(n)(6) The driver must face forward path Spot/Blocked Signaler/Barriers/Mirrors 
Vision 
4 29 CFR 1910.178(0)(1) Secure and properly positioned Tip over Measure distance of load 
29 CFR 1910.178(0)(2) loads 4 placement on fork tip 
A l 
5 General rule Driver must be attentive pee Anomaly Detection 
Behavior 
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4. PROPOSE FRAMEWORK 


This paper introduces a safety-oriented computer vision framework comprising five distinct modules: Signaler 
Detection, Face Recognition, Anomaly Detection, Signaler Detection, Load Monitoring, and Visibility 
Enhancement. These modules are seamlessly integrated into a central server, responsible for processing and storing 
information in a database. Additionally, the system is designed to trigger alarms whenever an unsafe event is 
detected. The setup employs three cameras: Models 1 and 2 are affixed to internal cameras to monitor operators 
and identify abnormal behaviors, meanwhile Model 3 utilizes a depth camera mounted on the tip of a forklift to 
improve visibility and Models 4 and 5 are connected to the Signaler Detection and Load Placement Monitoring 
components, respectively as depicted in Figure 1. 


sy, 
Internal 
Ad-Camera 


FACI 


: RECOGNITION 
Bec BB 
: Oe a 
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if condition is 
not satisfied 


ANOMALY DETECTION 


Generate an Alarm to alert 
stile iranat whet strained nied a r eae nt darned the forklitt’s operator 


Figure 1: Forklift Operations Framework 


4.1 Authorization through face recognition 


The proposed framework for authorization employs an advanced facial recognition methodology for identity 
verification and access control, utilizing cutting-edge algorithms and computational methodologies to scrutinize 
facial features meticulously as shown in Figure . This comprehensive process initiates with capturing a facial image, 
subsequently contrasting the unique characteristics of this image against a meticulously curated database of stored 
facial templates. The seamless integration of state-of-the-art deep learning models, specifically Convolutional 
Neural Networks (CNNs) (Li et al., 2022), ensures the proficient extraction and meticulous analysis of detailed 
facial features, granting a high degree of accuracy and security in identity verification. This method emerges as 
crucial in numerous secure environments, providing swift and efficient determination of access eligibility 
predicated on the consequential recognition results. The scientific orchestration of our face recognition for 
authorization unifies the two stages first a facial feature database its elements are pre-processing (Bradski, 2000) 
to specify faces and feature extraction, attributing a unique ID to each face and storing these features for the second 
step. The second step involve inferencing for face detection for specific ID in a scenario its components are a deep 
learning-based feature extractor, a point based matching model (Lindenberger et al., 2023) and a detector for facial 
feature, while these are for detection part the post processing is crucial step which give authorization. 


This amalgamation of techniques ensures the refinement of the recognition results. The subsequent post-processing 
step manages the comparison results, facilitating the final step of authorization, thereby enhancing the reliability 
and robustness of the system. In essence, our approach (Figure 2) in blending these advanced technologies and 
methodologies culminates in the development of a secure, reliable, and efficient system, pivotal in reinforcing 
security measures and averting unauthorized access. Through the utilization of precise feature comparison and 
advanced post-processing techniques, our objective is to offer a sophisticated solution that effectively addresses 
the diverse challenges associated with secure authorization. 
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4.2 Anomaly detection for driver behavior 


In line with the objective of promoting safety and efficiency, the structured framework includes specialized 
anomaly detection algorithms. These algorithms are tailored to identify signs related to drowsiness, including 
driver fatigue, physical discomfort, and unusual behaviors requiring immediate attention. This is particularly 
crucial in scenarios involving heavy machinery operation within the construction industry, where such anomalies 
can escalate into significant safety risks and losses. The implemented algorithms meticulously observe and analyze 
various behavioral cues, such as the frequency of yawning, heavy eyelid drooping, moments of nodding off, or 
any behavior that deviates from the established norm. These cues indicate a potential decline in alertness or an 
increase in discomfort, both of which can adversely impact machinery operation. The proposed framework aims 
to identify drowsiness in forklift drivers by analyzing their facial expressions, as depicted in Figure . This system 
operates in real-time by collecting a dataset of images and labeling them for training purposes. The initial step 
involves enhancing the image quality and extracting relevant features through image pre-processing. For this 
purpose, a feature extractor based on the work of (Amir, Gandelsman et al., 2021) is employed. This feature 
extractor specializes in extracting detailed features from the facial images of drivers. 


Face-Detection 
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Post-Processing 


Figure 2: Face-Detection Framework for Authorization 
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Figure 3: Framework for Forklift drowsiness detection 


Once the features are extracted, a state-of-the-art deep learning model is utilized for classification. Specifically, a 
fully connected network, as described by (Schwing and Urtasun, 2015), is attached to the feature extractor. This 
network is trained to classify various images based on the driver's condition, distinguishing between normal and 
drowsy states. By utilizing deep learning, the system can learn intricate patterns and accurately categorize the 
driver's condition in real-time. After the classification step, a post-processing stage is implemented to activate an 
alert system based on the identified states. In practical terms, if the system detects drowsiness in the driver, it can 
promptly issue an alert or trigger a warning mechanism. This rapid response is crucial for preventing potential 
safety hazards, as it enables corrective actions or interventions to be initiated before accidents or injuries occur. 


The application of advanced computational techniques and deep learning models enables the system to achieve a 
high level of precision in detecting driver states. It can effectively differentiate between normal and unusual 
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behaviors, thereby ensuring the safety and well-being of individuals involved in forklift operations within 
construction sites. By promptly recognizing and addressing drowsiness, the system contributes to maintaining an 
overall safe working environment in construction settings. 


4.3 Enhanced visibility with stereo cameras 


In the scenario depicted in Figure , a challenge arises when the forklift is loaded, blocking the operator's direct line 
of sight to the area directly in front of the forklift. To tackle this issue, a stereo camera system will be installed at 
the front of the forklift's fork tip. This setup not only offers an unobstructed view of the concealed area but also 
employs computer vision-based object detection algorithms, such as yolov8 (ultralytistic, n.d.), to identify objects 
and determine their distance from the forklift. This comprehensive approach enhances safety by addressing vision 
obstruction concerns when the forklift is carrying a load. 
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Figure 4: Visibility Framework with stereo camera 


4.4 Blind spot solutions 


Blind spots that arise when objects approach a forklift from a corner, as illustrated in Figure necessitate specific 
safety measures. In compliance with [29 CFR 1910.178(n)(4)] and [29 CFR 1910.178(n)(6)] mentioned in (OSHA, 
2023b), it is essential to install signalers and convex lens mirrors at every corner within the work area where 
forklift operations take place. These signalers can be monitored through object detection technology. If a signaler 
is absent or not in its designated location, an alert message is generated, indicating unsafe conditions. 


To ensure comprehensive coverage of the construction site where forklifts operate, multiple cameras need to be 
strategically installed. These cameras are connected to a server via RTPS (Real-Time Publish-Subscribe) protocol, 
and an object detection model is deployed on the server to enhance safety and monitor blind spots effectively. 
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Figure 5: Signaler Detection 


4.5 Load placement monitoring 


Properly centering the load on a forklift is essential for safety, as highlighted in OSHA's 29 CFR 1910.178(0)(1) 
and 1910.178(0)(2) regulations. Incorrect load positioning increases the risk of tipping over, potentially causing 
operator injuries, equipment damage, and harm to nearby structures. Maintaining load stability and preventing tip- 
over accidents are critical. 


Extensive research on forklift safety underscores the significance of load centering. Misalignment of loads elevates 
the risk of tip-overs, diminishes stability and maneuverability, and can lead to structural damage and injuries. 
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Adhering to OSHA regulations is imperative. Gavanski's 2022 study identifies safety concerns with forklifts, 
serving as a valuable reference for safety enhancements (Gavanski, 2022). Furthermore, Xia et al. (2023) 
introduced a center of gravity estimation algorithm for counterbalanced forklifts, achieving precise position control 
(Xia et al., 2023). 


In this study, we introduce a computer vision-based framework (Figure 6) to oversee forklift operations during 
material loading and unloading. Utilizing advanced computational techniques and deep learning models, this 
system accurately identifies objects and reliably estimates the distance between the load's central point and the 
forklift's front wheel. The proposed framework plays a crucial role in enhancing safety at construction sites, 
benefiting both forklift operators and the machinery itself. Its primary function is to promptly detect and correct 
any load positioning issues relative to the forklift's mast, contributing significantly to a safe construction site 
environment. 


ClassiNeation — 


Convolution Layers | l 


Figure 6: Forklift and Load Detection 


5. DISCUSSION & CONCLUSION 


The "iSafe ForkLift" framework represents a comprehensive solution aimed at enhancing safety in construction 
site forklift operations through the integration of advanced computer vision technology. This integrated system 
encompasses sophisticated features such as facial recognition for driver authentication, anomaly detection to 
mitigate operator drowsiness, stereo cameras to augment visibility, blind spot solutions, and load placement 
monitoring to preempt tip-over incidents. This framework stands out due to its holistic approach, a departure from 
conventional solutions that primarily address safety concerns individually. By systematically addressing multiple 
safety risks within a singular, efficient framework, it significantly enhances overall safety. However, there are 
certain limitations that necessitate careful consideration and foster discussions on technology, costs, safety, and 
compliance, with iSafe ForkLift improving forklift operations and worker well-being. The effectiveness of the 
framework is contingent upon the dependability of its technology, which may be vulnerable to adverse 
environmental conditions. Challenges may arise in accurately distinguishing between workers and signalers 
through computer vision. Acceptance among operators, coupled with concerns regarding data security and privacy 
management. Pertinent factors such as initial investment costs, ongoing maintenance, and operator training may 
pose challenges. Furthermore, the framework beckons opportunities for refinement, particularly concerning the 
reduction of false positives in anomaly detection and scalability. Following OSHA's safety protocols significantly 
boosts workplace safety. The integration of computer vision and IoT technologies enhances operational precision, 
while vigilance on uneven or challenging surfaces ensures stability and safety, thereby augmenting overall safety 
in the dynamic construction industry. In future work, we will delve deeper into these regulations by introducing 
dedicated computer vision solutions to enhance forklift safety across various industries. 
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AUTOMATED EXTRACTION OF BRIDGE GRADIENT FROM 
DRAWINGS USING DEEP LEARNING 
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Ruhr University Bochum, Germany 


ABSTRACT: Digital methods such as Building Information Modeling (BIM) can be leveraged, to improve the 
efficiency of maintenance planning of bridges. However, this requires digital building models, which are rarely 
available. Consequently, these models must be created retrospectively, which is time-consuming when done 
manually. Naturally, there is a great interest in the industry to automate the process of retro-digitization. This 
paper contributes to these efforts by proposing a multistage pipeline to automatically extract the gradient of a 
bridge from pixel-based construction drawings using deep learning. The bridge gradient, a key element of the 
structures axis, is critical for describing the elevation profile and axis slope. This information is implicitly 
contained in the longitudinal view of bridge drawings as gradient symbols. To extract this information, the well- 
established object detection model YOLOv5 is employed to locate the gradient symbols inside the drawings. 
Subsequently, EasyOCR and heuristic rules are applied to extract the relevant gradient parameters associated 
with each detected symbol. The extracted parameters are then exported in a machine-interpretable format to 
facilitate seamless integration into other applications. The results show a promising 98% accuracy in symbol 
detection and an overall accuracy of 70%. Consequently, the pipeline represents a significant advance in 
automating the retro-digitization process for existing bridges by reducing the time and effort required. 


KEYWORDS: Building Information Modeling, Computer Vision, Deep Learning, Symbol Detection, Optical 
Character Recognition, Construction Drawings 


1. INTRODUCTION 


Bridges play a central role in the transport network, as they are an essential element that creates connections and 
thus enables the transport of people and goods. To ensure their functionality and safe operation, regular inspections 
and effective maintenance management are of utmost importance. To support efficient maintenance planning, 
digital methods such as Building Information Modeling (BIM) can be employed. BIM refers to a digital 
collaboration method based on the creation and interdisciplinary exchange of digital building models (Borrmann 
et al., 2018). These models combine semantic information with geometric representations, acting as a single and 
central source of continuously enriching project information. Informed decisions can thus be made based on 
accurate and current data. Despite the potential BIM offers for all lifecycle phases, especially for the operation and 
maintenance (O&M) phase, it is most commonly used in the design phase of a construction project (Durdyev et 
al., 2022). 


The limited utilization of BIM in the O&M phase arises from the requirement for digital as-built models of the 
structure, which are often not available. In many cases, this is due to the fact that the buildings were designed and 
constructed without the implementation of BIM. Therefore, these models have to be created retrospectively. Since 
this is a time-consuming manual task, there is a major research effort to assist or automate the process through the 
use of artificial intelligence (Schénfelder et al., 2023). 


While many different sources of information can be used to create a digital building model of an existing structure, 
construction drawings are the most accessible. They not only contain geometric and semantic information about 
the building but also describe the internal structure of the components or building elements that are obstructed, 
e.g., buried underground. Therefore, drawings are an important source of information. This research contributes to 
the automatic creation of digital models from construction drawings by proposing a multi-stage pipeline for the 
automatic extraction of bridge gradient information from pixel-based construction drawings. 


The bridge gradient illustrates the elevation profile of the bridge's axis and is, therefore, an essential information 
for a precise reconstruction of the superstructure's geometry. The course of the gradient is contained in the 
longitudinal view. It is implicitly described through gradient symbols that hold important details about elevation 
and slope at specific points along the bridge axis. Therefore, the pipeline encompasses several stages to 
automatically extract the gradient information from multiple locations and link the information. First, the pipeline 
utilizes state-of-the-art deep learning-based methods to detect the gradient symbols within the drawings. 
Subsequently, the text information associated with each symbol is extracted using optical character recognition 
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Fig. 1: Workflow diagram of the multi-stage pipeline for extracting the gradient of the bridge axis. 


(OCR). Finally, the information is consolidated and harmonized into a structured data schema. The paper gives 
insight into the individual process steps and evaluates the performance of the proposed pipeline on a set of real- 
world bridge drawings. 


The remaining sections of the paper are organized as follows: Section 2 reviews recent research publications on 
automatic reconstruction of digital models from drawings. The implemented pipeline is described in Section 3, 
with a detailed explanation of each step. Section 4 shows the results obtained when applying the proposed pipeline 
to real drawings. Finally, Section 5 discusses the results and limitations of this study and outlines a perspective for 
further research. 


2. RELATED WORK 


So far, little research has been published on digitizing technical drawings (Moreno-Garcia et al., 2019). Only a 
few of these publications deal with the (semi-)automatic analysis of drawings for infrastructures. Poku-Agyemang 
& Reiterer (2023) proposes a semi-automated process that utilizes the Douglas-Peucker algorithm to detect the 
corner points of illustrated components. These points are then used to reconstruct the component exterior edges, 
ultimately creating the bridge’s geometry. A different semi-automatic framework is proposed by Akanbi & Zhang 
(2022). The proposed process involves converting the drawings into a vector-based data format, aligning the 
contained views, and finally using the extracted information to reconstruct the bridge’s geometry. The 
reconstructed digital model is exported in the IFC (Industry Foundation Classes) data format. An approach based 
on deep learning methods is introduced by Mafipour et a. (2023). The authors present a pipeline employing 
YOLOv5 and CRAFT to automatically detect the individual components and texts in the drawings. The detected 
objects are then clustered based on their labels, as each component may appear in different views of the structure. 
These views are provided to an expert who uses this information to create the bridge based on a predefined 
parametric model manually. In contrast, Faltin et al. (2023) proposes a different approach for linking the views in 
construction drawings. The authors utilize FasterRCNN for detecting section symbols that illustrate the 
interconnections between views. Each detected symbol is uniquely identified through the section reference, 
allowing it to be mapped to the corresponding view using OCR on the view title. The interconnections are 
established across all views contained in a drawing set for a specific structure. 


Overall, a larger number of publications exist on the reconstruction of high-rise building models from drawings. 
Wei et al. (2022) proposes a pipeline for detecting and reconstructing walls from floor plans. Firstly, the drawing 
is divided into patches, and the ResNet model is employed to identify patches containing walls. The walls are then 
detected in the positive patches using YOLOv3. All detections are merged to enable the utilization of Dynamo to 
create the digital model. Similarly, Zhao et al. (2020) uses the YOLO object detector to locate structural elements 
in the column structure and generate framework plan images. In a subsequent study, Zhao et al. (2021) continues 
the research by incorporating the superior Faster R-CNN model and introducing the creation of an IFC file from 
the extracted information. Kim et al. (2021) propose an approach for the detection of rooms, walls, and openings 
in floor plan images. The authors use a conditional generative adversarial network to create a heat map of the 
intersections and perform a style transfer to the floor plan image. Using the heat map of the connection points, the 
walls and openings are vectorized, which provides important information for recreating the building geometry. 


However, to the best of the author’s knowledge, no publication has been made that addressed the extraction of the 
bridge gradient from pixel-based drawings. This research aims to close the identified gap. 


3. METHODOLOGY 


A detailed explanation of the implemented methods is provided in the following section. Fig. 1 presents an 
overview of the proposed process. The input for the pipeline is a pixel-based longitudinal view of the bridge, as it 
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contains the required gradient information. In this paper, it is assumed that the view has already been extracted 
from the complete drawing set, which the engineer can do manually in advance. The pipeline first preprocesses 
(1) the view by dividing it into smaller patches to enable the symbol detection. The gradient symbols (2) are 
detected within these patches, and for each symbol, the associated gradient parameters (3) are extracted. Finally, 
the information is exported in a structured format to facilitate its utilization within BIM modeling software. 


3.1 Dataset Creation and Annotation 


To detect the gradient symbol in the drawings, the state-of-the-art object recognition network YOLOv5 (Jocher et 
al., 2022) is employed. Training the network requires a dataset of bridge construction drawings manually annotated 
with the gradient symbol. However, since the gradient symbol only occurs in small numbers in a longitudinal view, 
this results in a limited number of training data points. This is insufficient to receive a well-trained model. 
Therefore, the training data is synthetically generated, using a copy-and-paste strategy following Faltin et al. 
(2023). The real annotated data is only used to test the network. 


pees 


4 252428 


Fig. 2: Variation in created gradient symbols, sizes ranging from 85x85 to 20x20 pixels. 


For the synthetic data, a template set of 14 unique gradient symbols is created and employed in the process (cf. 
Figure 2). These symbols vary in size from 85x85 to 20x20 pixels, shape, and texture, providing increased diversity 
in order to improve the models' ability to generalize. 


The gradient symbol is chosen from the available template set and is randomly inserted into the background images. 
These background images are randomly cropped from bridge construction drawings which do not contain the 
gradient symbol and consist of various segments of construction drawings with different pixel sizes. To ensure 
compatibility with the YOLOv5 model, the background images are uniformly cropped to a standardized size of 
640 x 640 pixels. This process is repeated multiple times to generate different combinations enhancing the diversity 
of the synthetic dataset. Fig. 3 presents some exemplary results of the method. 
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Fig. 3: Exemplary training images with a side length of 640 by 640 pixels synthetically generated using the 
proposed copy-paste method. 
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In total a number of 980 training images and 280 validation images are generated, resulting in the dataset named 
Symbols. Additionally, 62 real different images extracted from real bridge drawings are used for testing. According 
to Jocher et al. (2022) adding up to 10% of empty background images to the training and validation dataset 
enhances the object detection performance. Hence, 128 images are added to the training data, while 32 images are 
added to the validation data. This dataset is called Symbols+BG. Table 1 provides an overview of the final datasets. 


Table 1: Overview of the generated data sets. Synthetic images are only used for training and validation. 


No. of synth. training No. of synth. No. of real testing 
Content : er eer : 
images validation images images 
Symbols Symbols only 980 280 62 
Symbols+BG Symbols and empty background 1108 312 62 


3.2 Preprocessing and Gradient Symbol Detection 


In order for the YOLOvS5 model to handle the large image sizes of the input longitudinal views, a sliding window 
approach is employed. As shown in Fig. 4, as a first step a 640 by 640 pixel sized window is shifted across the 
image in increments of 330 pixels, ultimately covering the entire image. This step ensures that the gradient symbol 
is displayed in its entirety in at least one window, improving the detection performance. The trained YOLOvS5 
detector individually processes the windows, and the detected symbol is recorded. In cases where a symbol is 
detected in multiple overlapping windows, non-maximum suppression is employed to mitigate the possibility of 
double detections. For the final detections, rectangular regions are cropped from the original input image, 
extending 330 pixels in each direction (see Fig. 4), which is found to be sufficient. These cropped regions are 
further processed in the gradient parameter extraction, as explained in the following section. 


Preprocessing 


Construction drawing 1. Sliding window approach 2. Gradient detection and cropping the image 
330 px 


330 px 


——?' 
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Fig. 4: Preprocessing of the construction drawing and detecting the gradient within the cropped area. 
3.3 Gradient Parameter Extraction 


After detecting the gradient symbols, the associated gradient parameters are extracted. To reliably recognize the 
parameters, the EasyOCR! text recognition model is used. EasyOCR combines the CRAFT (Baek et al. 2019) 
network and the text recognition network CRNN (Shi et al. 2018), since it is specifically designed for OCR. 
Therefore, EasyOCR provides robust capabilities for recognizing and extracting text from images. EasyOCR is 
employed in each region, as provided by the results from section 3.2. Within these regions, the text may appear 
rotated, reducing the EasyOCR model's recognition accuracy. To address this issue, the OCR engine is employed 
on the original region and a 90-degree rotated version. In addition, various image pre-processing techniques, e.g., 
adjustments to brightness and contrast, are applied to further improve the text recognition results. 


An additional challenge is that not all text in the regions is relevant for the gradient reconstruction. Therefore, to 
ensure that only the relevant text is detected for further analysis, filtering, and string-matching techniques are 
implemented. 


' EasyOCR: https://github.com/JaidedAI/EasyOCR (Accessed 1 August 2023). 
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Fig. 5: Exemplary representation of the recognized texts in the original (left) and 90° rotated (right) patches. The 
recognized texts are marked in red, and the parameters are marked in blue. The crossed-out text passages cannot 
be assigned to any parameter and are, therefore, ignored. 


In Fig. 5, the process of rotation, filtering, and string-matching is demonstrated. For instance, the text 'LandstraBe 
001' is irrelevant for the gradient detection and thus discarded. This filtering process is accomplished by identifying 
specific characters, such as KM, TS, or T, as depicted by blue in Fig. 5. Subsequently, the values that appear in the 
same horizontal line (cf. Fig. 5 marked with red rectangles) are matched with the identifying specific characters. 
In contrast, values located far away from the clusters are discarded. This process filters out unwanted text so that 
only relevant information is used in further analysis. 


4. TRAINING & RESULTS 
4.1 Training Process 


Three different model sizes of YOLOVS are trained and their performance is compared: YOLOvS5m, YOLOv5I, 
and YOLOvSx. Each model trains on both the Symbols and Symbols+BG dataset using a NVIDIA A100 SXM4 40 
GB graphics card. The models trained on Symbols are referred to as YOLOv5m, YOLOvSI, and YOLOv5x, while 
the ones trained on Symbols+BG are denoted as YOLOv5m_b, YOLOvSI b, and YOLOvS5x_b. During training a 
batch size of 56 is used for a maximum of 300 epochs. Backpropagation is performed using a learning rate of 0.01 
with a stochastic gradient descent (SGD) optimizer (Robbins & Monro, 1951). 


4.2 Symbol Detection Results 


The YOLOvS5 models are evaluated on 62 test images. Several key evaluation metrics are analyzed: precision, 
recall, intersection over union (IoU), mean average precision (mAP) at an IoU threshold of 0.50 (mAP@0.50), and 
mAP across IoU thresholds from 0.50 to 0.95 (mAP@0.50:0.95). The IoU measures the overlap between the 
predicted and ground truth bounding box. Precision measures the proportion of true positive detections out of all 
positive detections made by the model, indicating the accuracy of the model's predictions. On the other hand, recall 
represents the proportion of actual gradient symbols correctly identified by the models, measuring the model's 
ability to capture all instances of the target object. mAP@0.50 is a metric that enables the evaluation of the model's 
precision on average when there is at least a 50% IoU with the ground truth bounding boxes. In simpler terms, 
mAP@0.50 allows the assessment of how well the model performs in accurately detecting and localizing the 
gradient symbol, even when there is a moderate level of overlap between the predicted bounding boxes and the 
actual objects in the images. mAP@0.50:0.95 offers a more comprehensive evaluation by considering a range of 
IoU thresholds, enabling a better understanding of the model's performance across different levels of overlap. 
Since the gradient symbols are relatively small, this study considers an IoU of 50% sufficient. 


The test results of different trained YOLOv5 models are shown in Table 2. It can be observed that the models 
achieve high precision scores, indicating their capability to accurately detect the gradient symbol in the real image 
dataset. Moreover, the models show varying levels of recall. YOLOv5m archives the highest recall score of 0.957, 
while YOLOv5I scores the lowest value of 0.882. Overall, the models successfully identify the gradient symbols 
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but with some variation in recall performance. 


Table 2: Gradient detection results for different models. The bold fonts indicate the best results. 


YOLOv5m YOLOv51 YOLOv5x YOLOv5m _b YOLOvSL b YOLOv5x_b 
Precision 0.998 1 0.984 1 0.999 0.999 
Recall 0.957 0.882 0.906 0.913 0.942 0.957 
mAP} 0.965 0.942 0.966 0.958 0.977 0.976 
MAPD 0.95 0.878 0.865 0.882 0.874 0.89 0.894 


For the mAP@0.50 all models demonstrate strong performance. YOLOv51_b achieves the highest mAP@0.50 of 
0.977, followed closely by YOLOv5x_b with 0.976. YOLOv51 achieves the lowest mAP@0.50 score of 0.942. 
Considering the mAP@0.50:0.95 all models have consistently high scores. YOLOv5x_b achieves the highest score 
of 0.894, closely followed by YOLOvSI b with 0.89. YOLOvS1 shows the lowest mAP@0.50:0.95 score of 0.865. 
Overall, the models indicate a good performance across the range of IoU thresholds. Considering the detection 
speed and mAP@0.50 performance YOLOVS| b is selected as the best model for this application. Some detection 
results of gradient symbols are presented in Fig. 6. 


4.3 Overall Pipeline Results 


The models successfully detect the symbols despite variations such as rotation and partial view. To assess the 
performance of the overall pipeline, it is tested on four real longitudinal views, each containing multiple gradient 
symbols. To evaluate the OCR accuracy, a character-level analysis is performed. The recognized characters 
extracted by EasyOCR are compared to the expected gradient parameters associated with each drawing. The 
accuracy is then calculated by determining the percentage of correctly recognized parameters relative to the total 
number of parameters present. 
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Fig. 6: Example results of gradient symbol detection. 


The evaluation results reveal an overall accuracy of 70%. This shows that the pipeline can successfully extract 
gradient parameters from most of the drawings. However, there is room for improvement, especially in cases where 
parameter recognition becomes difficult due to variations in fonts and image quality. 
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5. DISCUSSION & CONCLUSION 


The results of this study demonstrate the effectiveness of the proposed multistage pipeline for automatically 
extracting bridge gradient information from pixel-based construction drawings using deep learning techniques. 
The YOLOvS models trained on different datasets showed high precision scores on the test dataset, indicating their 
capability to detect gradient symbols accurately. The results indicate that training the models on the synthetic 
dataset is beneficial to overcome a lack of data. The addition of background images has improved the overall 
performance of the models. 


The evaluation metrics, including mAP@0.50 and mAP@0.50:0.95, revealed strong performance across all 
models. Notably, YOLOvSL b achieved the highest mAP@0.50 score with 98%, making it the most suitable model 
for this application considering detection speed and accuracy. The detection results of gradient symbols further 
illustrate the pipeline's ability to successfully identify symbols despite variations in rotation and partial views. The 
pipeline’s overall performance in extracting gradient parameters from real bridge drawings is promising, with an 
OCR accuracy of approximately 70%. However, there is still room for improvement, especially in cases with 
challenging text recognition due to varying fonts, and image quality. 


In conclusion, the proposed pipeline presents an effective approach for the retro-digitization of existing bridges, 
significantly reducing the time and effort required for this crucial task. The successful extraction of gradient 
information from construction drawings holds great potential for improving bridge asset management and 
maintenance planning. For future research, fine-tuning, and optimization of the OCR component could further 
enhance accuracy and pave the way for broader applications. 


REFERENCES 


Akanbi, T., & Zhang, J. (2022). Semi-Automated Generation of 3D Bridge Models from 2D PDF Bridge Drawings. 
In F. Jazizadeh, T. Shealy, & M. J. Garvin (Eds.), Construction Research Congress 2022 (pp. 1347-1354). 
https://doi.org/10.1061/9780784483961.141 


Baek, Y., Lee, B., Han, D., Yun, S. and Lee, H. (2019). Character Region Awareness for Text Detection. In the 
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19365-19374. 
https://doi.org/10.1109/CVPR.2019.00959 


Borrmann, A., König, M., Koch, C., & Beetz, J. (2018). Building Information Modeling: Why? What? How? In 
A. Borrmann, M. König, C. Koch & J. Beetz (Eds.), Building Information Modeling. Springer. 
https://doi.org/10.1007/978-3-319-92862-3_ 1 


Durdyev, S., Ashour, M., Connelly, S., & Mahdiyar, A. (2022). Barriers to the implementation of Building 
Information Modelling (BIM) for facility management. Journal of Building Engineering, 46, Article 103736. 
https://doi.org/10.1016/j.jobe.202 1.103736 


Faltin, B., Schonfelder, P., & König, M. (2023). Inferring Interconnections of Construction Drawings for Bridges 
Using Deep Learning-based Methods. In E. Hjelseth, S. F. Sujan & R. J. Scherer (Eds.), ECPPM 2022-eWork and 
eBusiness in Architecture, Engineering and Construction 2022 (pp. 343-350). CRC Press. 
https://doi.org/10.1201/9781003354222 


Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., Michael, K., TaoXie, Fang, J. imyhxy, Lorna, Zan 
Yifu, Wong, C., V, A., Montes, D., Wang, Z., Fati, C., Nadar, J., Laughing, ... Jain, M. (2022). ultralytics/yolov5: 
v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. Zenodo. https://doi.org/10.5281/zenodo.3908559 


Kim, S., Park, S., Kim, H., & Yu, K. (2021). Deep Floor Plan Analysis for Complicated Drawings Based on Style 
Transfer. Journal of Computing in Civil Engineering, 35(2), Article 04020066. 
https://doi.org/10.1061//ASCE)CP.1943-5487.0000942 


Mafipour, M. S., Ahmed, D., Vilgertshofer, S., & Borrmann, A. (2023). Digitalization of 2D Bridge Drawings 
Using Deep Learning Models. The 30th EG-ICE: International Conference on Intelligent Computing in 
Engineering. https://www.ucl.ac.uk/bartlett/construction/sites/bartlett_ 
construction/files/digitalization_of 2d bridge drawings using deep learning models.pdf 


Moreno-Garcia, C. F., Elyan, E., & Jayne, C. (2019). New trends on digitisation of complex engineering drawings. 


689 


Neural computing and Applications, 31, 1695-1712. https://doi.org/10.1007/s0052 1-018-3583-1 


Poku-Agyemang, K. N., & Reiterer, A. (2023). 3D Reconstruction from 2D Plans Exemplified by Bridge 
Structures. Remote Sensing, 15(3), 677. https://doi.org/10.3390/rs 15030677 


Robbins, H., & Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 
22(3), 400—407. http://www.jstor.org/stable/2236626 


Schonfelder, P., Aziz, A., Faltin, B., & König, M. (2023). Automating the retrospective generation of As-is BIM 
models using machine’ learning. Automation in Construction, 152, Article 104937. 
https://doi.org/10.1016/j.autcon.2023.104937 


Shi, B., Bai, X., & Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence 
Recognition and Its Application to Scene Text Recognition. JEEE transactions on pattern analysis and machine 
intelligence, 39(11), 2298-2304. https://doi.org/10.1109/TPAMI.2016.2646371 


Wei, C., Gupta, M., & Czerniawski, T. (2022). Automated Wall Detection in 2D CAD Drawings to Create Digital 
3D Models. Proceedings of the 39th International Symposium on Automation and Robotics in Construction (pp. 
152-158). IAARC Publications. https://doi.org/10.22260/ISARC2022/0023 


Zhao, Y., Deng, X., & Lai, H. (2020). A Deep Learning-Based Method to Detect Components from Scanned 
Structural Drawings for Reconstructing 3D Models. Applied Sciences, 10(6), 2066. 
https://doi.org/10.3390/app 10062066 


Zhao, Y., Deng, X., & Lai, H. (2021). Reconstructing BIM from 2D structural drawings for existing buildings. 
Automation in Construction, 128, Article 103750. https://doi.org/10.1016/j.autcon.202 1.103750 


690 


PREDICTING MENTAL WORKLOAD OF USING EXOSKELETONS 
FOR CONSTRUCTION WORK: A DEEP LEARNING APPROACH 
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ABSTRACT: Exoskeletons are gaining attention as a potential solution for addressing low back injury in the 
construction industry. However, use of active back-support exoskeletons in construction can trigger unintended 
consequences which could increase mental workload of users while working with exoskeletons. Prolonged increase 
in mental workload could impact workers’ wellbeing and productivity. Prediction of mental workload during 
exoskeleton-use could inform strategies to mitigate the triggers. This study investigates a machine-learning 
framework for predicting mental workload of workers while using active back-support exoskeletons for 
construction work. Laboratory experiments were conducted wherein Electroencephalography (EEG) data were 
collected from participants wearing active back-support exoskeletons to perform flooring task. The EEG data 
underwent preprocessing, including band filtering, notch filtering, and independent component analysis, to remove 
artifacts and ensure data quality. A regression-based Long Short-Term Memory network was trained to forecast 
future time steps of the processed EEG data. The performance of the network was evaluated using root mean 
square error (RMSE) and r-squared (R’). A RMSE of 0.1527 and R° of 0.9665 indicating good fit and strong 
correlation, respectively, were observed between the predicted and actual EEG data. Results of the comparison 
between the actual and predicted mental workload also show strong correction with an R° of 0.8692. The findings 
motivate research directions into real-time monitoring of mental workload of workers during exoskeleton-use. The 
study has significant implications for stakeholders, enabling them to gain a deeper understanding of the impact of 
mental workload while using exoskeletons thereby providing opportunities for mitigation. 


KEYWORDS: Work-related musculoskeletal disorders, Exoskeleton, Mental workload, Electroencephalogram, 
Long Short-Term Memory, Flooring task. 


1. INTRODUCTION 


The prevalence of work-related musculoskeletal disorders (WMSDs) among the construction workforce is a 
growing concern in the construction industry. The US Bureau of Labor Statistics reports that workers in the 
construction industry are 1.23 times more likely to sustain WMSDs compared with workers in other industry 
sectors (BLS, 2020). The same report explains that the back is the one of the most affected body parts. Construction 
workers, such as floor layers, suffer from back injuries at 1.7 times workers in other industry sectors. For example, 
floor layers experience back injuries at the rate 22.5 MSDs per 10,000 full-time workers, and this has been known 
to result in an average of 26 days’ work absence. Exoskeletons are increasingly being perceived as a solution to 
WMSDs. Exoskeletons, such as back-support exoskeletons, are wearable devices designed to support or augment 
users’ back while performing work (Gonsalves et al., 2023; Ogunseiju et al., 2022). Exoskeletons are classified as 
passive and active depending on their mode of augmentation. Passive back-support exoskeleton, while less costly 
than active back-support exoskeletons, provide support to the back using dampers and springs. Whereas active 
back-support exoskeletons provide support to the back using electrical motors — this makes active back-support 
exoskeletons bulkier. These devices have been shown to reduce risks factors of back injuries by reducing muscle 
activity (Theurel et al., 2018), range of motion (Cumplido-Trasmonte et al., 2023), body discomfort (Gonsalves et 
al., 2021; Kim et al., 2019), and rate of exertion (Alemi et al., 2020; Baltrusch et al., 2021). Despite these benefits, 
studies have shown that exoskeleton-use in construction can trigger difficulty working in confined spaces 
(Nussbaum et al., 2019), fall risks due to the weight of the device (Alabdulkarim et al., 2019; Kim et al., 2019; 
Massardi et al., 2023), discomfort to body parts (Gonsalves et al., 2023; Gonsalves et al., 2021), restrictions in 
movement (Fox et al., 2020; Poliero et al., 2020), catch and snag risks (de Looze et al., 2016; Kim et al., 2019), 
and thermal discomfort (Liu et al., 2021). The devices could also be challenging to adjust to fit (Gonsalves et al., 
2023; Gorgey, 2018). Moreover, unequal loading and balancing of body parts due to improper adjustment can 
cause users to be more aware of the device than their task and surrounding, which could increase workers’ mental 
workload (Bequette et al., 2020; Marchand et al., 2021). 


Prolonged increase in mental workload can result in distraction, emotional distress, anxiety, and stress, which have 
downstream implication on workers’ overall well-being and performance. Real-time monitoring of workers’ 
mental workload during exoskeleton-use could inform strategies to reduce the triggers. However, scarce efforts 
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have been made to investigate models for predicting workers’ mental workload during exoskeleton-use. 
Electroencephalogram (EEG) can be used to measure brain activity corresponding to mental workload. EEG 
signals are widely used for inferring mental workload, due to the high temporal resolution, convenience, and cost- 
effectiveness of the supporting devices. Machine learning techniques, particularly deep learning, provides 
opportunities for extracting insightful features from EEG data that could be used for predicting mental workload. 
Long Short-Term Memory network, a variant of recurrent neural network, can learn long-term dependencies 
between time steps of data and predict future time-series sequences of the data. Long Short-Term Memory (LSTM) 
network has been used for sequential learning tasks like construction equipment activity analysis (Hernandez et 
al., 2019), construction workers’ safety harness usage (Guo et al., 2023), mixed reality learning environment 
(Ogunseiju et al., 2023) and, fatigue detection and early warning system (Liu et al., 2020) that need historical time- 
series data in the decision-making process. Therefore, this study investigates the extent to which workers’ mental 
workload due to exoskeleton-use can be predicted from EEG data using Long Short-Term Memory network. Using 
a case-study of a flooring task, this paper presents a comparison of the actual and predicted mental workload due 
to performing work with an active back-support exoskeleton. The results of this study contribute to scarce 
knowledge on the unintended consequences of using wearable devices such as exoskeletons for construction work. 


2. BACKGROUND 
2.1 Mental Workload Evaluation with EEG 


Mental workload are the mental resources required for task execution (Chen et al., 2017). A previous study (Fan 
& Smith, 2017) has shown that mental workload is associated with task demand and performance. Mastropietro et 
al. (2023) showed that low mental workload (underload) and high mental workload (overload) can negatively 
affect the task being executed leading to increase in rate incidence of errors. Chen et al. (2016) opined that when 
a person places too much attention on a task, the individual has less attention to focus on other stimuli. In the 
context of this study, exoskeleton-use may demand attention, thereby reducing the mental resources that may be 
required for the task or being aware of the user’s surrounding to prevent fall risks, and catch and snag risks 
(Gonsalves et al., 2023; Zhu et al., 2021). These risks can impact the mental workload resulting in exoskeleton 
users being stressed or distracted, thus retarding their productivity and safety. This makes prediction of mental 
workload a major interest in ergonomics (Young et al., 2015). Over the years, subjective and objective methods 
have been employed to infer mental workload. Subjective methods include the use of questionnaires such as NASA 
Task Load Index and Subjective Workload Assessment Technique. On the other hand, objective methods include 
the use of data collection instruments such as functional magnetic resonance imaging-fMRI, and 
electroencephalography (Ryu & Myung, 2005). However, EEG has been touted as one of the most suitable devices 
for measuring brain activities to infer mental workload (Chen et al., 2016; Qin & Bulbul, 2023). 


Borghini et al. (2014) estimates mental workload using theta-to-alpha brain waves ratio from EEG data. Similarly, 
another study (Missonnier et al., 2006) indicated that using EEG signals, an increase in mental workload is noticed 
when there is a decrease in alpha brain waves (8-13Hz) and increase in the theta brain waves (4-8Hz) during the 
execution of specific tasks. In the construction industry context, EEG has been used in some studies (Chen et al., 
2016; Chen et al., 2017; Qin & Bulbul, 2023; Yang et al., 2023) to estimate mental workload. For instance, EEG 
was used to estimate the mental workload of construction workers to on-site safety conditions (Chen et al., 2016). 
Engagement index, time-frequency indicator in EEG, was used to assess the mental workload of workers when 
exposed to construction vulnerabilities. Chen et al. (2017) used EEG approach to measure task mental load of 
construction workers based on the power spectral densities of major frequency bands. In addition, the effect of 
distractions in construction work zones on drivers’ mental workload was measured using EEG (Yang et al., 2023). 
Despite these efforts and unintended consequences of exoskeletons, there are scarce studies on predicting mental 
workload due to exoskeleton-use on construction sites. 


2.2 Machine Learning for Mental Workload Prediction 


With machine learning techniques, EEG data can be transformed into frequency domain representations which can 
enable extraction of brain rhythms. For instance, Jebelli et al. (2018a) used support vector machine, a supervised 
machine learning technique, to classify stress levels of construction workers. However, the use of supervised 
learning technique requires hand-crafting of features which could be labor-intensive and may be insufficient to 
support real-time monitoring of mental workload (Wang et al., 2023). Deep learning techniques, such as 
convolutional neural networks (CNN), have been used to extract intrinsic features from time-series data for 
recognizing occupational stress, fatigue and mental workload (Jebelli et al., 2019; Mehmood et al., 2023; Qin & 
Bulbul, 2023). Recurrent neural network, a class of CNN, is widely used for forecasting time-series data such as 
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brain activity. Recurrent neural network, such as Long Short-Term Memory (LSTM), has been noted to perform 
better in learning time-series data due to its high prediction accuracy and ability to overcome problems of 
overfitting (Wang et al., 2020). Furthermore, LSTM can solve the problem of gradient exploding and vanishing 
when processing large sequential data (Hochreiter & Schmidhuber, 1997). Jaiswal et al. (2023) noted that LSTM 
model performed better than other models in predicting cognitive fatigue. Moreover, in their study, LSTM 
eliminated the need for extensive data preprocessing and feature extraction which could have resulted in loss of 
useful information in EEG data. Also, Liu et al. (2022) used LSTM for detecting fatigue of drivers. In construction, 
Qin and Bulbul (2023) used LSTM for predicting the mental workload of workers while using augmented reality 
head-mounted display for construction assembly. Despite these possibilities, limited studies have harnessed LSTM 
for predicting mental workload associated with exoskeleton-use for construction work. 


3. METHODOLOGY 


This section, including Figure 1, describes the procedure employed to achieve the objective of the study including 
the experimental design to collect brain activity of participants performing flooring task with an exoskeleton, 
preprocessing of the brain activity data, and prediction of mental workload with the data. 


Experimental Design Data Collection Data Processing Sree nee EES Mental Workload 
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Fig. 1: Overview of Methodology. 
3.1 Experimental Design and Data Collection 


Eight male graduate students (n = 8) where recruited to perform a flooring task with an active exoskeleton. Similar 
sample sizes have been used by previous studies (Wei et al., 2020). None of the participants reported any prior 
musculoskeletal injury that would impact their participation in the study. The active exoskeleton used for the study 
is the Cray X shown in Figure 2. The Cray X, from German Bionic, weighs 7kg and can provide a lifting support 
of about 30kg. Cray X consists of a frame and strap pads of different sizes for the legs, chest, shoulders, and waist. 
The frame includes a 40V battery and motor. The exoskeleton provides different levels of support for bending, 
lifting, placing, and walking. 
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Fig. 2: Active (CrayX) back-support exoskeleton. 
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The flooring task involved lifting, placing and installing 20 floor tiles in each bay of a wooden frame comprising 
of six bays. Each bay can fit 20 floor tiles (see Figure 3). The participants were asked to lift and place 20 timber 
tiles (10kg) beside each bay, and subsequently install the stacked tiles in each bay. Each tile weighs 0.5kg. A cycle 
of flooring task includes lifting, placing, and installation of the timber floor tiles (20) in each bay. The task 
comprises of six cycles given that the participants installed the tiles in six bays. 


Fig. 3: Experimental layout of the simulated flooring task. 


Prior to commencing the tasks, the participants received instructions on how to perform the task. The participants 
performed the aforementioned flooring task with an active exoskeleton. During these conditions, the participants 
wore an EEG cap. The EEG device records electrical activity of the brain through contact between electrodes 
embedded in various portions of the cap and the scalp. The brain produces electrical signals of brain waves at 
different frequencies such as delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-13Hz), beta (13-30Hz), and gamma (>30Hz) 
(Chen et al., 2023). The delta, theta, alpha, beta and gamma bands correspond to deep sleep, powered thinking, 
alertness, concentration, and attentional processing, high mental activity, and information processing respectively 
(Ke et al., 2021). This study utilized a 14-channel EEG device measuring brain activity at 256Hz. 


3.2 Data Preprocessing 


EEG data are susceptible to contamination from intrinsic and extrinsic artifacts, particularly when subjects are 
engaged in physical activities like construction work (Jebelli et al., 2018b). These artifacts impact the quality of 
the signal. Intrinsic artifacts are triggered by movements such as eye blinking and muscle movement, while 
extrinsic artifacts are caused by external influences such as noise from wires and electrode popping. This study 
used the framework proposed by Jebelli et al. (2018b) to reduce the artifacts in the EEG data obtained from the 
simulated task. The EEG data were fed into EEGLAB, a MATLAB toolbox for processing physiological data. The 
extrinsic artifacts were removed using a Bandpass filter with cut-off frequencies of 0.5 and 65 Hz (Jebelli et al., 
2018b). Another extrinsic artifact due to noise from wires was removed using a notch filter applied at a frequency 
of 60Hz. The intrinsic artifacts were removed using independent component analysis (ICA) (Mantini et al., 2008). 
The EEG data was decomposed using Extended Infomax method into 14 components, representing the 14 channels 
of the EEG device, and displayed using a scalp heatmap (Frølich & Dowding, 2018). Preprocessed data from eight 
channels (i.e., AF3, F3, P7, O1, O2, P8, F4, and AF4) were utilized for this study. The data points from the channels 
were split into training, validation, and testing, accounting for 70%, 10%, and 20% respectively. 


3.3 Prediction of EEG Data 
3.3.1 Long Short-Term Memory network 


LSTM network, deep learning artificial recurrent neural network variant (RNN), was leveraged in this study to 
forecast subsequent values of the preprocessed EEG data. LSTM takes cognizance of the changes that could occur 
as the user gets used to the use of the device over time. LSTM neural network processes data by iterating over 
current time steps and retaining useful information to help with the processing of new data points. The regression 
LSTM neural network consists of four layers: an input layer, the LSTM layer, the fully connected layer and a 
regression layer. The input layer accepts the input time-series data and transfers this to the LSTM layer. The LSTM 
layer comprises of a cell, an entry gate, an exit gate, and a forget gate. The cell stores long-term time-series data 
and uses the gates to control flow of the data within and out of the cell. The forget gate decides which information 
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should be ignored in the cell. The LSTM layer comprises of 128 hidden units. The number of hidden units 
determines how much information or data is learned by the layer. More hidden units could result in better results 
but are more likely to result to overfitting to the training data. The fully connected layer does the discriminative 
learning in the LSTM network. It learns weights that can identify features in the training data. The regression layer 
determines the performance metrics needed for the prediction task. The LSTM neural network is trained with a 
time-series sequence of EEG data, where the outputs are EEG values of subsequent time steps. To prevent 
overfitting and divergence of the training, the predictors and targets were normalized to zero mean and unit 
variance (Jebelli et al., 2018b). Hyperparameters have a significant impact on the performance of models. The 
network was trained with the Adam optimizer, an extension of stochastic gradient descent. In addition, 200 epochs, 
as well as a learning rate of 0.001 was used. 


3.3.2 Performance evaluation 


The performance of the LSTM model was evaluated using the Root Mean Square Error and R-squared. Root Mean 
Square Error (RMSE) is a standard statistical metric for computing accuracy. RMSE is generally used to evaluate 
the difference between the actual and predicted value from the model. RMSE is determined via equation 1, where 
Ajand Pj are the actual and predicted EEG datasets respectively, and n is the number of EEG datasets. The lower 
the RMSE, the better a model would fit a dataset. The R-Squared (R°) which describes the variance in the response 
of a regression model, was computed following Renaud and Victoria-Feser (2010). The R? value ranges from 0 to 
1. The higher the R? value, the better a model fits a dataset. 


RMSE = [Y -Ad?/n (1) 


3.4 Mental Workload 


The pre-processed EEG data (from Section 3.2) and the predicted EEG data from Section 3.3.1 were decomposed 
into frequency components to determine the power spectra of the data using Welch (1967)’s approach. The 
approach uses fast Fourier transformation with a Hamming window to determine the power spectral density of the 
EEG data. The relative band power of the windowed or segmented data in theta and alpha frequency bands were 
determined. Xing et al. (2020) mentioned that theta and/or alpha power are suitable indicators of mental workload. 
The mental workload of each segment was determined by dividing the absolute power in the theta band with the 
absolute power in the alpha band. The approximate spectral limits of these frequency bands are 4-8 Hz (theta) and 
8-14 Hz (Simon et al., 2011). 


4. RESULTS AND DISCUSSION 
4.1 Performance of the LSTM Model for Each Channel 


Table 1 shows the RMSE and R? scores for the EEG channels of one of the test participants. The RMSE values are 
less than 0.3 with the P7 and O1 channels having the lowest prediction errors. A previous study has indicated that 
a RMSE value closer to zero gives a better predictive power (Miyamoto et al., 2022). Similarly, the low RMSE in 
this study shows the high predictive power of the LSTM network in predicting mental workload. The R? scores of 
the channels are more than 0.9 which indicates close alignment or similarity between the predicted and actual EEG 
values. 


Table 1: RMSE and R? scores for all the EEG channels. 


AF3 F3 P7 Ol 02 P8 F4 AF4 Average 


RMSE 0.1174 0.1404 0.0772 0.0975 0.1700 0.1876 0.2090 0.2225 0.1527 


R? 0.9061 0.9144 0.992 0.9899 0.9941 0.9832 0.9755 0.9771 0.9665 
4.2 Mental workload 


4.2.1 Comparison between Predicted and Actual PSD 


Figure 4 shows the predicted and actual power spectral density of the AF3 and O2 channels for the data of one of 
the test participants. The predicted and actual data are the red and blue lines respectively. At less than 45Hz, both 
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figures show some consistency between the predicted and actual PSD values. 
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Fig. 4: Predicted and actual power spectral density of the AF3 channel (left) and O2 channel (right). 
4.2.2 Mental workload 


The extent to which mental workload due to exoskeleton-use can be predicted is illustrated in the scatter diagram 
in Figure 5. The plot has a R? score of 0.8692 indicating a strong correlation between the predicted and the actual 
mental workload. A previous study (Coulibaly & Baldwin, 2005) has shown that R? score between 0.8-0.9 is 
termed acceptable. The result suggest that it is possible to predict mental workload during exoskeleton-use for 
construction work. Previous studies have corroborated the assertion that mental workload can be predicted 
(Borghini et al., 2014; Missonnier et al., 2006; Qin & Bulbul, 2023). 
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Fig. 5: Comparison of predicted and actual values of the mental workload. 


5. CONCLUSIONS, LIMITATIONS AND FUTURE WORK 


This study presents the extent to which mental workload due to exoskeleton-use can be predicted from EEG data 
using Long Short-Term Memory network. EEG data were obtained from an experimental study where participants 
performed flooring task with an active back-support exoskeleton. The data were preprocessed and trained with the 
Long Short-Term Memory network to identify unique features for forecasting brain activity. A comparison of the 
actual and predicted brain activity data indicates close consistency, with average root mean square error and r- 
squared of 0.1527 and 0.9665 respectively. Similar trends were observed in the comparisons of the predicted and 
actual power spectrums and mental workload. The results of this study contribute to scarce literature on the impact 
of unintended consequences of using exoskeletons for construction work. The study motivates investigations into 
the use of machine learning for real-time performance predictions of technological innovations on construction 
projects. The study may have been limited due to the sample size of eight participants which was used to train the 
deep learning algorithm. Training data from a larger sample could improve the performance of the model and its 
generalizability. This would be achieved by using time-series based data augmentation techniques such as scaling, 
permutation and generative adversarial networks. Future studies can compare the mental workload of no 
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exoskeleton and active exoskeleton conditions. In addition, further investigation on the suitability of other deep 
learning networks to identify the most suitable networks for predicting mental workload can be carried out. Besides, 
future studies can support mental workload prediction with the understanding of the risks influencing mental 
workload using subjective feedback that describes user experience of exoskeletons. 
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ABSTRACT: Computer vision-based safety monitoring requires machine learning models trained on generalized 
datasets covering various viewpoints, surface properties, and lighting conditions. However, capturing high-quality 
and extensive datasets for some construction scenarios is challenging at real job sites due to the risky nature of 
construction scenarios. Previous methods have proposed synthetic data generation techniques involving 2D 
background randomization with virtual objects in game-based engines. While there has been extensive work on 
utilizing 360-degree images for various purposes, no study has yet employed 360-degree images for generating 
synthetic data specifically tailored for construction sites. To improve the synthetic data generation process, this 
study proposes a 360-degree images-based synthetic data generation approach using Unity 3D game engine. The 
approach efficiently generates a sizable dataset with better dimensions and scaling, encompassing a range of 
camera positions with randomized lighting intensities. To check the effectiveness of our proposed method, we 
conducted a subjective evaluation, considering three key factors: object positioning, scaling in terms of object 
respective size, and the overall size of the generated dataset. The synthesized images illustrate the visual 
improvement in all three factors. By offering an improved data generation method for training safety-focused 
computer vision models, this research has the potential to significantly enhance the automation of the construction 
safety monitoring process, and hence, this method can bring substantial benefits to the construction industry by 
improving operational efficiency and reinforcing safety measures for workers. 


KEYWORDS: 360-Degree Images, Computer Vision, Synthetic Data Generation, Game Engine, Object 


Detection, Construction Safety Monitoring 


1. INTRODUCTION 


Effectively monitoring construction sites in real-time demands robust computer vision (CV) models trained on 
diverse datasets capturing different viewpoints, surface properties, and lighting conditions (Li et al., 2022; Sami 
Ur Rehman et al., 2022). This diversity ensures accurate and adaptable surveillance for improved safety and 
efficiency. However, obtaining such extensive and high-quality datasets poses significant challenges. Furthermore, 
addressing the complexities of dynamic construction environments with numerous elements and rapidly changing 
conditions is a recognized challenge. To address these issues, the utilization of smart devices for automated 
progress and real-time construction monitoring through object detection technology and positional data presents a 
rapid and precise solution (Rho et al., 2020). 


Previous methods attempted to address this issue propose synthetic data generation techniques by introducing 2D 
background randomization with virtual objects in game-based engines (Zhang et al., 2022). While these 
techniques have shown promise in controlled settings, they still exhibit shortcomings in dynamic construction 
work environments (Lee et al., 2022). This limitation might arise from the complexity of recreating the intricate 
interactions among dynamic elements and the complex spatial relationships found in construction sites. Capturing 
the subtle aspects of depth perception and object occlusion, which are essential for precise object detection, can 
pose a challenge when working with synthetic datasets (Choi & Pyun, 2021). 


This study presents a pioneering method utilizing 360-degree images to craft synthetic data, leveraging the Unity 
3D game engine, to comprehensively address these challenges. By adopting this approach, we aim to bridge the 
gap between synthetic and real-world data, enabling effective training of CV models for construction scenarios. 
The proposed method efficiently generates a sizable dataset with better dimensions and scaling, encompassing a 
range of camera positions with day and night lighting intensities. This not only facilitates improved object detection 
but also opens avenues for applications in construction progress monitoring and site safety analysis. To validate 
the proposed method, we conducted a thorough analysis, encompassing the examination of three crucial factors: 
Object Position, objects’ size scaling, and dataset size. 
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Object positioning factor scrutinizes the precision with which objects are localized within the synthetic data. It 
involves a meticulous assessment of the alignment and placement of objects in relation to their real-world 
counterparts. Higher accuracy in object positioning indicates closer adherence to reality, crucial for applications 
demanding precise spatial understanding. Object size scaling refers to the scale at which objects are represented 
within synthetic data, a critical dimension. This factor entails a meticulous examination of whether the size 
relationships between objects are faithfully replicated. Accurate scaling ensures that the synthetic data mirrors the 
real-world environment, influencing tasks where object proportions are of paramount importance. Dataset size 
describes the magnitude of the dataset generated through the proposed technique, an instrumental metric in its 
efficacy. A larger dataset not only provides a more comprehensive representation of the construction scenario but 
also potentially enhances the performance of machine learning models. It enables a broader spectrum of scenarios 
to be captured, invaluable in training robust computer vision models. 


Following the introduction and background section, which provide a thorough overview of the research problem, 
Section 2 offers an extensive literature review. Section 3 then elaborates on the proposed synthetic data generation 
method. Section 4 performs a comparative analysis between the proposed approach and conventional 
methodologies. Finally, the paper concludes by summarizing its contributions and delineating potential avenues 
for future research. 


2. LITERATURE REVIEW 


The construction industry encounters formidable challenges in procuring authentic image data, largely stemming 
from the perilous and dynamic nature of construction sites. In response, synthetic data generated through computer 
graphics emerges as a promising and cost-effective solution for training machine learning models in the 
construction sector. This approach proves especially conducive to tasks such as site monitoring, defect detection, 
and material classification, endowing models with the capability to derive invaluable insights. Consequently, this 
enhances the efficiency and efficacy of construction processes and outcomes in practical, real-world scenarios. 


A suite of techniques, including computer graphics, data simulation, data augmentation, generative adversarial 
networks (GANSs), and synthetic-to-real transfer learning, collectively contribute to the generation of synthetic 
datasets. Among these, computer graphics entails the creation of three-dimensional (3D) models of objects and 
scenes through specialized software platforms like Blender, Autodesk 3ds Max, Maya, or the Unity Game Engine. 
This process culminates in the rendering of these models to generate two-dimensional (2D) images (Frolov et al., 
2022). The advantage lies in the controlled environment it affords for data generation, enabling the creation of 
bespoke datasets with predefined attributes, including controlled variations in lighting, camera angles, or object 
orientations (Oh et al., 2021; Wong et al., 2019). Data simulation, on the other hand, involves the emulation of the 
underlying data generation processes and interrelationships between variables. This endeavor yields synthetic 
images that closely mimic the appearance and geometric characteristics of their real-world counterparts. 
Concurrently, generative adversarial networks (GANs) represent a neural network architecture comprising a 
generator network and a discriminator network. This setup empowers the generator to create novel images while 
the discriminator distinguishes between generated and authentic images, culminating in adversarial training. 
Moreover, synthetic-to-real transfer learning encompasses the training of a model on synthetic data, followed by 
fine-tuning on authentic data. This iterative process engenders synthetic data that closely approximates real-world 
data. These methodologies, whether utilized in isolation or synergy, facilitate the generation of expansive and 
diverse synthetic datasets for training and assessing machine learning models within the construction domain. 


In 2015, Soltani et al. conducted pioneering work in synthetic data generation for excavation tasks, utilizing 
Autodesk 3ds Max and Google SketchUp in conjunction with Histogram of Gradient (HOG) transformations for 
precise segmentation (Soltani et al., 2016). Since then, a surge of studies, particularly post-2020, has significantly 
advanced this field. Neuhausen et al., Kim et al., and Tohidifar et al., turned to Blender software, leveraging its 
capabilities in constructing intricate 3D models of workers (J. Kim et al., 2022; Neuhausen et al., 2020; Tohidifar 
et al., 2022). Similarly, Kim et al. extended Blender's utility by creating synthetic images of scaffolds through the 
integration of point clouds (A. Kim et al., 2022). Barrera-Animas et al., introduced an innovative methodology 
that combines Blender-based synthetic image generation with automatic labeling, effectively surmounting 
limitations associated with separate processes (Barrera-Animas & Davila Delgado, 2023). 


Wei et al. harnessed Building Information Modeling (BIM) software to create comprehensive 3D models of 
buildings at various stages of construction. These models were seamlessly integrated with Unreal Engine, 
facilitating the generation of a synthetic dataset exhibiting diverse light conditions and viewing perspectives (Wei 
& Akinci, 2022). Sutjaritvorakul et al. and Siu et al. capitalized on Unreal Engine's capabilities to generate images 
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of workers and sewerage pipes, respectively, catering to object detection through closed-circuit television (CCTV) 
cameras (Siu et al., 2022; Sutjaritvorakul et al., 2020). 


More recently, there has been a notable surge in the utilization of the Unity Perception package for synthetic data 
generation. Another study harnessed Unity Perception to intricately craft building facades in Koto City, employing 
manual annotation techniques involving polygons and bounding boxes (Zhang et al., 2022). Similarly, few more 
efforts towards generating datasets tailored for construction workers and excavation tasks, implemented bounding 
box auto-annotation methods (Assadzadeh et al., 2022; Lee et al., 2022). Notably, the Unity Perception package 
was initially introduced by Borkman et al. for synthetic dataset creation in computer vision applications (Borkman 
et al., 2021). This package emerged as an invaluable resource in generating expansive synthetic data sets for our 
machine learning models. Its rich array of features prompted us to explore its potential with 360-degree images 
within the construction domain. Noteworthy functionalities such as the Perception Camera and robust data labeling 
options facilitated the creation of a diverse and highly accurate dataset, crucial for training effective machine 
learning models. Indeed, earlier studies were constrained by their narrow focus and reliance on manual annotation 
methods, which limited their scalability and real-world applicability. Moreover, challenges in faithfully replicating 
dynamic construction environments remained unaddressed. This study pioneers 360-degree image and Unity 3D- 
based synthetic data generation for construction scenarios, overcoming challenges, and improving object detection 
for safety and monitoring. 


3. METHODOLOGY 


The research process outlined in this paper is delineated into five distinct stages, as illustrated in Figure 1. The 
research process began with Data Collection and Preprocessing, where real-world construction site imagery was 
gathered and prepared for further analysis. Subsequently, Virtual Construction Site Simulation was conducted to 
create a digital representation mirroring real-world scenarios. This simulation served as the foundation for 
generating annotated data through the integration of the Unity Perception package. The package's capabilities, 
including Perception Camera and data labeling options, played a pivotal role in producing a diverse and accurately 
annotated dataset. Finally, the Results and Evaluation section scrutinized the performance of the machine learning 
models trained on the synthetic data, providing valuable insights into their effectiveness in dynamic construction 
environments. Furthermore, Figure 2 illustrates the complete system architecture. 
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Fig 1: Research Process 


3.1 Data collection and preprocessing 


The initial phase of constructing our synthetic dataset involved the acquisition of authentic 360-degree images 
from real-world construction sites. This imperative task was accomplished through on-site visits to various 
construction locations, where we employed a Ricoh Theta V camera to capture high-fidelity images. A 
comprehensive set of approximately 49 shots was meticulously taken across three distinct sites. These captures 
were strategically dispersed across three distinct sites, a deliberate maneuver aimed at ensuring a panoramic 
representation of both indoor and outdoor contexts. This strategic diversity not only broadens the dataset's scope 
but also fortifies its ecological fidelity, essential for simulating real-world scenarios accurately. 


To uphold the dataset's integrity and applicability, a discerning filtering process was meticulously implemented. 
This process entailed the deliberate exclusion of instances featuring the target classes or objects slated for 
subsequent annotation. Moreover, as a supplementary measure to bolster the dataset, an additional 20 images were 
judiciously selected from reputable online sources. These images underwent a rigorous refinement process, 
facilitated by Adobe Photoshop, a widely recognized image editing tool. This intervention was crucial in the 
removal of extraneous elements that might have inadvertently compromised the dataset's fidelity. This meticulous 
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curation was necessitated by the publicly accessible nature of these images. 


By rigorously adhering to these meticulous acquisition and preprocessing protocols, we established a robust 
foundation for the subsequent stages of annotation and synthetic data generation. This methodological rigor 
ensures that the dataset remains of the highest quality and relevance, serving as an indispensable resource for the 
accurate training and evaluation of our computer vision models in dynamic construction environments. 


3.2 Virtual construction site simulation: 


During this pivotal phase, the synthesis of virtual components was meticulously undertaken to construct a dynamic 
virtual construction site. This critical stage comprised four foundational elements, each of which played a 
significant role in enhancing the authenticity and diversity of the synthetic dataset. A situation as a case scenario 
where workers are actively engaged in ladder climbing was meticulously composed to clearly capture the real 
atmosphere of a construction site environment. 


3.2.1 Avatar Placement: 


The foundational step involved generating 3D avatars with distinct physical attributes and appearances through 
Ready Player Me. To augment avatar diversity, human parts from SketchFab were incorporated, with 
customization of attire and features. Avatars were tailored to emulate team members' characteristics using the Unity 
3D avatar maker plugin, and adjustments to facial features and joints were made using Blender and Unity 3D. Six 
avatars were strategically positioned within the hollow cylinder structure. This included four male workers 
alternating between two ladders switching on/off at predetermined intervals. Helmets, a safety precaution, were 
optionally worn during simulation. Dataset diversity was amplified by introducing random alterations in clothing 
color, pattern, skin complexion, and hair shade. This amalgamation of diverse and realistic avatars not only 
diversifies the synthetic dataset but also augments its capacity to simulate a broad spectrum of personnel scenarios 
within a construction site environment. 


3.2.2 Animation Rigging: 


In the subsequent phase, a climbing animation was seamlessly integrated into the avatars within the unity game 
engine. A pre-existing climbing animation from Mixamo! was utilized and imported into the Unity engine. The 
animation was then rigged to the avatars, with meticulous adjustment of parameters to ensure perfect 
synchronization with the ladder model The seamless integration of climbing animations fortifies the dataset by 
providing a realistic representation of ladder-related actions. This, in turn, equips ML models with the proficiency 
to accurately identify and classify such actions within the construction site environment. 


3.2.3 Texture Mapping: 


The Texture Mapping phase was dedicated to seamlessly enveloping the 360-degree background images around 
the virtual cylinder. Distinct materials were assigned to each image, with meticulous organization within a 
designated folder named MaterialsBG in the Unity project. To infuse the synthetic data with variety, a script was 
implemented to randomly select materials from the MaterialsBG folder, generating a diverse array of image frames. 
This strategic process played a pivotal role in cultivating a truly immersive virtual environment. The judicious 
mapping of the 360-degree images onto the surface of the cylinder heightened the authenticity of the synthetic 
dataset, furnishing it with meticulously curated training data of superior quality for ML models. 


3.2.4 Randomization and Lighting Conditions: 


Dynamic lighting effects were harnessed to infuse our virtual construction site with an authentic and immersive 
ambiance. The Unity engine's built-in directional light was adeptly employed to replicate a spectrum of natural 
lighting conditions, spanning dawn, noon, afternoon, and night. A meticulously crafted script dynamically 
modulated the light intensity, mimicking the natural progression of light over time. This meticulous attention to 
lighting dynamics further fortified the efficacy of the synthetic dataset, as CV based techniques rely on precise 
lighting cues for accurate object identification and localization. 


1 https://www.mixamo.com/#/ 
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3.3 Generating annotated data through unity perception PACKAGE INTEGRATION: 


The incorporation of the Unity Perception Package stands as a transformative stride in elevating the authenticity 
and intricacy of our synthetic dataset. This comprehensive toolset within the Unity environment streamlines the 
process of automatic labeling and annotation of objects based on pre-defined classes, a critical step for training 
computer vision models. 


At the core of this integration lies the Camera Sensor component. This crucial element faithfully simulates the 
behavior of a real-world camera, capturing images from the scene with a level of fidelity that closely mirrors actual 
visual acquisition. This technical facet not only enriches the dataset's realism but also empowers our models to 
process images in a manner akin to their real-world counterparts. Complementing the Camera Sensor is the Labeler 
Component, an indispensable tool for training computer vision models. This component meticulously annotates 
objects within the scene, a vital step in providing the models with ground truth information for accurate object 
detection and classification. This facet significantly enhances the dataset's utility in training robust models capable 
of discerning and categorizing objects within complex construction environments. Moreover, the Renderer 
Component stands as another critical element in this integration. This component harnesses the Unity Perception 
Package's neural rendering pipeline to meticulously render the scene. By doing so, it imbues the generated images 
with a level of visual fidelity that is essential for training models to recognize and understand complex, real-world 
scenarios. 


The Labeler Component acts like a foundation in the process of training computer vision models. It interfaces 
seamlessly with a predefined set of classes and automatically identifies objects within the scene, assigning them 
appropriate labels based on their class membership. This automated labeling process significantly amplifies the 
dataset's efficacy in training robust models capable of discerning and categorizing objects within the dynamic and 
intricate environments of construction sites. The Labeler Component operates in tandem with scripts written in C#, 
leveraging the Perception API provided by the package. These scripts access object information and apply labels 
to them based on their class. The API includes functions and methods that facilitate this automatic annotation 
process. Through the orchestrated interplay of the Labeler Component, Perception API, and scripts, our synthetic 
dataset gains a precise and detailed ground truth, indispensable for training computer vision models. This annotated 
data forms the cornerstone of our dataset, empowering our models in object detection and environmental 
perception tasks within the complex and safety-critical context of construction sites. Through the orchestrated 
interplay of the Labeler Component, Perception API, and scripts, our synthetic dataset gains a precise and detailed 
ground truth, indispensable for training computer vision models. This annotated data forms the cornerstone of our 
dataset, empowering our models in object detection and environmental perception tasks within the complex and 
safety-critical context of construction sites. 


As aresult of this integration, the Unity Perception Package generates several key JSON files alongside the image 
dataset. These files, including 'annotation_definitions.json’, 'captures.json’, 'metric_definitions.json’, 'metrics.json’, 
and 'sensors.json', hold crucial annotation data, capture details, metric definitions, recorded metrics, and sensor 
information respectively. They play a pivotal role in enriching our dataset for object detection model training and 
evaluation. The annotation_definitions.json file, for instance, provides detailed information about class labels and 
object attributes, while captures.json contains information regarding the captured scenes. These JSON files 
collectively form the backbone of our dataset, empowering our models with the necessary information to 
accurately perceive and interpret the dynamic construction environment. 
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Fig 2: System Architecture explaining whole process of synthetic dataset generation. 
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4. RESULTS 


The synthetic data generated through the proposed approach utilizing 360-degree images was focused on three 
critical factors: Object Position (Accuracy), Object Size (Scaling), and Dataset Size. The results from proposed 
techniques are shown in Figure 3. 


4.1.1 Object Positioning: 


Upon thorough examination, it became evident that the synthetic data derived from 360-degree images 
demonstrated excellent accuracy in object position determination. The comprehensive perspective offered by 360- 
degree images facilitated more precise object localization within the construction environment. This was 
particularly notable in instances where objects were positioned in close proximity or within complex spatial 
configurations. The enhanced depth perception afforded by the spherical perspective of 360-degree imagery played 
a pivotal role in accurately pinpointing object locations. Moreover, the capability to observe objects from diverse 
angles within a single image frame contributed to a deeper comprehension of their spatial relationships within the 
construction site. This heightened accuracy in object positioning is a critical advantage, especially in safety-critical 
environments where precise object localization is paramount for accident prevention and worker safety. 


4.1.2 Object’s Size Scaling: 


One of the notable advantages of leveraging 360-degree images was the observed improvement in object size 
scaling. The spherical perspective offered by 360-degree imagery allowed for a more precise representation of 
object sizes in relation to their immediate surroundings because it encompassed a comprehensive view of the entire 
environment without the distortions. This panoramic view allowed for a faithful portrayal of how objects interacted 
within the construction site, aligning closely with their real-world proportions. This was especially pertinent when 
considering objects from varying angles or perspectives. The spherical view granted by 360-degree images 
mitigated distortions, ensuring that object sizes were accurately depicted regardless of the viewing angle. In 
contrast, data generated from 2D background images often grappled with limitations in faithfully representing 
object sizes, particularly when objects were viewed from non-standard angles. Additionally, the ability to view 
objects from multiple angles in a single image frame further contributed to this enhanced accuracy in size 
representation. This critical improvement in object size scaling with 360-degree imagery contributes significantly 
to the overall fidelity and accuracy of the synthetic dataset, ultimately enhancing the effectiveness of computer 
vision models in interpreting real-world construction scenarios. 


4.1.3 Dataset Size: 


The size of dataset played a pivotal role in influencing the performance of computer vision models. The use of 
360-degree images facilitated the creation of a larger dataset. A significant contributor to this considerable dataset 
expansion is the unique capability of 360-degree images. A single 360-degree image can generate multiple frames, 
each offering different perspectives while maintaining consistent lighting and camera positions. This efficiency 
greatly enhances the adaptability and effectiveness of models in real-world construction site scenarios. The 
substantial dataset derived from 360-degree images significantly broadened the ability of models to generalize and 
make well-informed decisions across a wide range of construction site scenarios. Undoubtedly, this rich training 
data strengthened the robustness and competence of models, establishing it as a valuable tool for real-time 
construction site monitoring. 
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Fig 3: Results of the proposed approach. 


5. DISCUSSION AND CONCLUSION 


This study introduces a 360-degree image-based approach for generating synthetic data to train CV models for 
construction site safety monitoring. The 360-degree dataset excels in object positioning due to its ability to provide 
a comprehensive view of the construction site from multiple perspectives. This advantage is particularly valuable 
for precise object localization in complex spatial arrangements common in construction settings. Additionally, the 
spherical perspective inherent in 360-degree images ensures a more accurate representation of object sizes relative 
to their surroundings. This accuracy is crucial for effective monitoring in construction environments, as it aligns 
object proportions faithfully with real-world dimensions. Furthermore, the larger dataset size derived from 360- 
degree images offers extensive and diverse training data, enabling the model to generalize effectively across 
various construction scenarios. Unlike 2D images that produce only one synthetic image for one scenario, each 
360-degree image can generate multiple synthetic images for one scenario. 


The advantages of employing 360-degree images as the foundation for synthetic data generation are multifold. The 
all-encompassing view offered by 360-degree imagery allowed for a more holistic representation of the 
construction environment. This comprehensive perspective, combined with the ability to accurately depict object 
positions and sizes, resulted in a dataset that offered a remarkably realistic simulation of real-world scenarios. 


It is important to acknowledge that while this approach demonstrates notable progress, it does have certain 
limitations. A limitation of using 360-degree imagery, versus prior 2D background images, is handling dynamic 
elements on construction sites. While offering a broad view, 360-degree images may introduce distortion, 
especially at the edges, impacting object detection accuracy, especially for peripheral objects. Addressing and 
correcting this distortion during data generation is crucial for precise computer vision model training. Future 
research aims to expand the dataset, explore advanced model architectures, and assess data efficacy through 
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computer vision-based model training on actual construction sites. Integration of multimodal information, like 
combining imagery with LiDAR or other sensors, will enrich the dataset and enhance object detection accuracy. 
Furthermore, we will objectively measure the effectiveness of the proposed approach by conducting computer 
vision-based model training and comparing the results with the traditional method of generating synthetic data. 
This approach holds the potential to serve as a foundational framework for numerous industries, offering a 
blueprint for the modernization of their safe operations, thereby propelling safety standards and operational 
prowess to new heights on an international scale. 
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ABSTRACT: Construction planning and scheduling are crucial aspects of project management that require a lot 
of time and resources to manage effectively. Machine learning and artificial intelligence techniques have shown 
great potential in improving construction planning and scheduling by providing more accurate insights into project 
progress and forecasting. This paper proposed a machine learning model that utilizes regularly updated site data 
to generate predictions of quantity variances from the plan and enable a more accurate forecast of future progress 
based on historical data on concrete activities. Also, the outputs of this model can be used when creating a schedule 
for a new project. New schedules created with the help of this model will be more consistent and reliable due to its 
vast data pool and ability to generate realistic forecasts from this data. The model utilizes data from completed 
and other ongoing projects to generate insights and provide a more accurate and efficient construction planning 
and scheduling solution. Within the scope of this study, different attributes of concrete pouring activities of different 
projects and locations were used as input data for a machine learning process, and then, using this model on test 
data, the forecast concrete quantities were obtained. This model provides a more advanced solution than 
traditional project management tools by incorporating machine learning techniques while significantly improving 
construction planning, scheduling accuracy, and efficiency, leading to more successful projects and increased 
profitability for construction companies. 


KEYWORDS: Machine Learning, Planning, Scheduling, Forecasting, Data Visualizing, Construction, Business 


Intelligence 


1. INTRODUCTION 


Developing project schedules is critical to all projects, including engineering, manufacturing, construction, and 
others (Faghihi, Reinschmidt & Kang, 2014). Creating a reliable schedule and then updating and monitoring it as 
the project progresses is crucial for project management. A continuous data flow from the site is necessary to 
monitor project progress correctly. This process creates a vast amount of data. The construction industry deals with 
significant data from various disciplines throughout the life cycle of a facility (Bilal et al., 2016). Despite the 
abundance of data generated, its utilization in construction projects is often overlooked, resulting in a staggering 
amount of unused information. It is postulated that 96% of the data collected during construction projects goes 
unused (Snyder et al., 2018). In order to harness the potential of this unused data, various techniques such as 
statistics, machine learning, and artificial intelligence can be employed. Statistics are already commonly studied 
and applied within the construction sector. However, the importance of machine learning (ML), more generally, 
artificial intelligence (AI), is mostly overlooked and not being applied by companies as necessary, despite the 
studies on the matter. 


Machine Learning applications have proven to outperform existing techniques, methods, and human decision- 
making on construction sites (Hammad et al., 2014). These methodologies offer valuable tools for further 
processing the data, enabling applications such as forecasting, risk analysis, labor allocation, and defect analysis. 
By leveraging these processes, construction professionals can unlock insights and optimize decision-making 
throughout the project lifecycle. Exploiting the power of ML data analytics tools can result in significant corporate 
benefits by enhancing the time performance of construction projects—regarded as one of the critical indicators of 
a successful project (Gondia et al., 2019). The most important part of construction project scheduling is the 
selection of resources (e.g., workforce, machines) and harmonizing their work (Jaskowski & Sobotka, 2006). This 
study aims to create more accurate forecasts for concrete pouring activities for effective planning, such as resource 
allocation in power plant projects. Often, project planners lack detailed drawings and necessary quantities at the 
beginning of the project. Even if such information is available initially, these quantities frequently change the 
project due to various factors. These factors may include unexpected soil features, inexperienced workforce, 
supplier delays, adverse weather conditions, or suboptimal planning. With the help of Machine Learning, 
correlations were sought between planned and at-completion quantities for data obtained from construction 
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projects. Data was collected from various ongoing or completed projects a construction company undertook to 
accomplish this. This data served as valuable input for machine learning models, enabling us to obtain meaningful 
and actionable results. 


2. MATERIAL AND METHODS 


Firstly, data was anonymously collected from 4 different projects (all personal and company-related information 
was removed). The projects are power plants in the following locations: 


e Tashkent, Uzbekistan, 
e Ashgabat, Turkmenistan, 
e Sulaymaniyah, Iraq (two projects). 


The primary data was obtained from the SAP Database ("SAP: Enterprise Application Software," n.d.) of the 
company, which is updated weekly for every project. SAP export data consisted of detailed weekly progress of the 
projects. Another data source is the Oracle Primavera ("Primavera P6 Enterprise Project Portfolio Management," 
n.d.) database. The company database has detailed L3 Updated Schedules and Baselines for each project. These 
schedules can provide planned and at-completion durations and start/finish dates if needed. At the date of this 
study, one of the projects was completed, and the other two were still ongoing, so data until the latest data date 
(30.06.2023) was used even though some of the activities were not completed. 


Firstly, concrete pouring activities were filtered. The projects and schedules were taken from the same company 
and created according to the same procedures. Thus, all concrete pouring activities’ wording and coding format 
are the same, as shown in Table 1. 


Table 1: Activity ID and Name Structure of the Schedules 


Activity ID Activity Name 

BZC-U-C-UBE- 1800 Excavation of Soil - Foundation Level - Control Building 
BZC-U-C-UBE-1810 Filling & Compaction - Foundation Level - Control Building 
BZC-U-C-UBE- 1860 Lean Concrete Pouring - Foundation Level - Control Building 
BZC-U-C-UBE- 1820 Installation of Formwork - Foundation Level - Control Building 
BZC-U-C-UBE- 1830 Installation of Rebar - Foundation Level - Control Building 
BZC-U-C-UBE- 1840 Concrete Pouring - Foundation Level - Control Building 

BZC-U-C-UBE- 1870 Installation of Formwork for Column - Ground Floor - Control Building 
BZC-U-C-UBE- 1880 Installation of Rebar for Column - Ground Floor - Control Building 
BZC-U-C-UBE- 1890 Concrete Pouring for Column - Ground Floor - Control Building 
BZC-U-C-UBE-2040 Installation of Formwork for Beam & SLCA - Ground Floor - Control Building 
BZC-U-C-UBE-2050 Installation of Rebar for Beam & Slab - Ground Floor - Control Building 
BZC-U-C-UBE-2060 Concrete Pouring for Beam & Slab - Ground Floor - Control Building 


The wording format of the data is as follows; 
"Activity Description" — Level/Element — Building 


The concrete activities start with "Concrete Pouring" as Activity Description. After the activity description, another 
attribute can be "level" or "element": foundation, column, slab, pedestal, wall, or trench. Moreover, there are 
different buildings of different sizes and floors. However, these projects mostly have concrete structures for 
mechanical and electrical equipment foundations. The data pool consisted of 263 activities; 180 were foundation 
concrete, and the other 83 were the other types of concrete activities. 


RapidMiner (Mierswa & Klinkenberg 2018) was used as a tool for further processes. Rapidminer is a program that 
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SECTION C - Al, DATA € 


enables modifying the data and applying various Machine Learning techniques with a simple interface. 


Raw data consists of planned quantities for each activity, weekly realized quantity for every data date, at 
completion quantity, project name, and project country for every activity. The data was manually transformed to 
distribute the attributes in the activity names to different columns for the machine-learning processes. In the early 
stages of the projects, planned quantity values were set to "1" for some of the foundation activities due to the 
unavailability of concrete quantity data during the baseline schedule development. As the concrete pouring 
activities progressed, these 'at completion’ quantities for these specific activities were updated to reflect the actual 
volumes poured. 


These activities need to be considered as outliers. The outlier is the data far from the average value of a statistics 
group. Outliers may affect the statistics and results substantially; therefore, they must be removed from the pool. 
Normalization is required to detect outliers in data pools with actual values, such as the one in this study, to ensure 
that variables with different scales are brought to a standard scale, preventing biased results. 


In order to apply this process, the model in Fig. 1 was created. 
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Fig.1: Outlier Detection Model 


Due to the abundance of foundation concrete activities, the algorithm considered the "non-foundation" entries as 
outliers in a previous model. Thus, a "foundation concrete activities" filter was applied to detect outliers only 
among the foundation activities. The model detected five excessive values (which have value of 1 m° as Planned 
Quantity) as outliers, and these rows were deleted. After clearing the outlier entries, the new model in Fig.2 was 
created with clean data. 
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Fig.2: Random Forest ML Model 
The input data consists of 5 different columns; 


l- Activity ID: It is a unique ID for each activity. 

2- Country: It includes the country project taking place. 

3- Element Type: Element Type is the type of concrete element, which can be the foundation, column, slab, 
pedestal, wall, or trench. 

4- Planned Quantity: It is the quantity planned at the beginning of the project, according to the baseline 
schedule. 

5- At Completion Quantity: At Completion Quantity is the actual quantity on the site, which often differs 
from the planned quantity for various reasons. 


Activity ID is unique for every row; the country column may include Iraq, Turkmenistan, or Uzbekistan. Element 
Type is the type of concrete element, which can be a foundation, slab, column, etc. Planned quantity is the quantity 
specified and planned at the beginning of the project, and At Completion, Quantity is the updated actual quantity 
throughout the project. 
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The model learns through the upper arm and applies the process to the Test Data (Table 2) below at the merging 
point (Apply model). Test Data consists of not started activities from the same three countries; thus, the table does 
not have at-completion quantities, leaving filling this column to the ML model. 


Table 2: Structure of the Test Data 


Planned 
Activity ID Country Element Type 
Quantity 
T_Actl Iraq Foundation 286.7 
T_Act2 Iraq Foundation 35.89 
T_Act3 Iraq Foundation 201.49 
T_Act4 Turkmenistan Foundation 31.25 
T_Act5 Turkmenistan Foundation 90.89 
T_Act6 Turkmenistan Foundation 219.19 
T_Act7 Iraq Slab 40.8 
T_Act8 Turkmenistan Slab 23.5 
T_Act9 Uzbekistan Slab 84.5 
T_Actl0 Iraq Column 99.21 
T_Actl1 Turkmenistan Column 137.69 
T_Actl2 Uzbekistan Foundation 140.7 
T_Act13 Uzbekistan Column 175.55 
T_Actl4 Iraq Pedestal 121.79 
T_Actl5 Turkmenistan Pedestal 132.37 
T_Actl6 Uzbekistan Pedestal 212.58 
T_Actl7 Uzbekistan Column 90.05 
T_Actl8 Turkmenistan Foundation 196.72 
T_Actl9 Iraq Foundation 114.84 
T_Act20 Turkmenistan Foundation 108.51 
T_Act21 Uzbekistan Foundation 24.18 
T_Act22 Turkmenistan Wall 131.3 
T_Act23 Uzbekistan Wall 286.66 
T_Act24 Turkmenistan Pedestal 55.83 
T_Act25 Uzbekistan Pedestal 13.6 
T_Act26 Iraq Trench 272.11 
T_Act27 Uzbekistan Trench 155.09 
T_Act28 Turkmenistan Foundation 113.56 
T_Act29 Turkmenistan Foundation 91.76 
T_Act30 Uzbekistan Foundation 96.38 


Random Forest regression was selected for the prediction process because more than two parameters affect the at- 
completion quantity: Country, Element Type, and Planned Quantity. Random Forest Regression is a widely used 
model in regression and classification problems. The accuracy of predictions increase when there are multiple 
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decision trees. 


3. RESULTS AND DISCUSSION 
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Although the amount is enough for consistency and prediction, usage of a broader data pool enables all the 
attributes to show effects on the results more clearly. For example, the projects' countries have many hidden 
variables affecting the quantities; however, this effect could not be seen clearly with only three countries. Also, 
most of the entries are for foundation concrete activities, so wall or column quantities did not affect the results as 
intended. Then, using the ML model with the input data in the test table, quantities of the activities were calculated 
at completion. The Final Table is given in Table.3 Forecasted "At Completion Quantities" are shown in the 


Prediction Column. 


Table 3: Predictions on the Testing Data 


Activity ID Country Element Type Planned Quantity Prediction 
T_Actl Iraq Foundation 286.7 494.71 
T_Act2 Iraq Foundation 35.89 59.04 
T_Act3 Iraq Foundation 201.49 226.17 
T_Act4 Turkmenistan Foundation 31.25 34.41 
T_Act5 Turkmenistan Foundation 90.89 91.79 
T_Act6 Turkmenistan Foundation 219.19 269.12 
T_Act7 Iraq Slab 40.8 60.73 
T_Act8 Turkmenistan Slab 23.5 31.13 
T_Act9 Uzbekistan Slab 84.5 112.65 
T_Actl0 Iraq Column 99.21 159.52 
T_Actl1 Turkmenistan Column 137.69 130.16 
T_Act12 Uzbekistan Foundation 140.7 160.31 
T_Act13 Uzbekistan Column 175.55 208.09 
T_Actl4 Iraq Pedestal 121.79 206.10 
T_Actl5 Turkmenistan Pedestal 132.37 124.42 
T_Actl6 Uzbekistan Pedestal 212.58 250.10 
T_Actl7 Uzbekistan Column 90.05 122.94 
T_Actl8 Turkmenistan Foundation 196.72 227.03 
T_Actl9 Iraq Foundation 114.84 221.01 
T_Act20 Turkmenistan Foundation 108.51 66.39 
T_Act21 Uzbekistan Foundation 24.18 154.24 
T_Act22 Turkmenistan Wall 131.3 121.42 
T_Act23 Uzbekistan Wall 286.66 B2275 
T_Act24 Turkmenistan Pedestal 55.83 74.38 
T_Act25 Uzbekistan Pedestal 13.6 32.94 
T_Act26 Iraq Trench 272.11 422.71 
T_Act27 Uzbekistan Trench 155.09 208.74 
T_Act28 Turkmenistan Foundation 113.56 77.91 
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T_Act29 Turkmenistan Foundation 91.76 91.79 


T_Act30 Uzbekistan Foundation 96.38 130.45 


After inspecting the results and the graph in Fig.3, it was seen that they are consistent. Activities with Iraq in the 
Country column tend to differ the most from the baseline plan because projects in Iraq suffered from substantial 
design changes until their completion. On the other hand, the difference is lower in Turkmenistan activities because 
the baseline plan for the Turkmenistan project was closer to the realized work. Therefore, even if the graph has 
some more significant gaps, they are because of country and project differences. However, a more extensive data 
pool would enable predictions with less error if available. The rows with trench, pedestal, and walls do not have 
as much input data as foundation concrete activities; thus, these predictions may not be as accurate as foundation 
concrete activities. This study used country and element types as supplementary features to the planned quantities 
dataset. However, it is worth noting that including more comprehensive variables, such as detailed weather 
conditions, workforce experience, and material strength, which are known to impact quantities at project 
completion significantly, can further enhance the predictive accuracy of the model. 


Model Results 
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Fig.3: Graph of the Predictions 


4. CONCLUSIONS 


This study collected data from various ongoing and completed construction projects a construction company 
undertook, encompassing parameters related to concrete pouring activities. Utilizing advanced supervised learning 
algorithms, Machine Learning models were trained to establish correlations between planned and at-completion 
concrete quantities. While the model shows promising potential, it is important to note that future research could 
obtain more realistic and accurate results with more extensive and diverse data. The optimization of quantity 
forecasting is a key outcome, and the integration of Machine Learning-based forecasting offers powerful decision 
support for project management, enabling proactive measures to minimize delays and resource shortages. The 
successful implementation of Machine Learning underscores the importance of data analytics tools in the 
construction industry, which, with further exploration and expanded data availability, can lead to improved project 
management practices, resource utilization, and more profitable projects. It is essential for future research to 
address data limitations and consider real-time data integration to enhance the reliability and effectiveness of 
Machine Learning applications in the construction sector. 
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ABSTRACT: During building project planning, various standards, such as material specifications, value ranges, 
and construction regulations, must be considered. When analyzing a regulation for its BIM-based use, it must be 
identified which information can be checked directly or indirectly using a BIM model. The basis for the directly 
checkable information requirements is the explicit description of object classes, object types, properties, and values. 
Additionally, complex validation rules can be derived from the standards. These information extractions are mostly 
performed manually and laboriously on text-based regulatory documents. To provide a better data format, the 
NISO proposed the Standard Tag Suite (NISO-STS), which is an XML format for publishing and exchanging full- 
text content and metadata of standards. This paper proposes a concept to enrich standards in NISO-STS format 
with information requirements and validation rules to provide a machine-interpretable semantic knowledge base 
for BIM processes. To achieve this, the concept utilizes natural language processing (NLP) methods to extract 
semantic information from the standards. Furthermore, the paper introduces a workflow to transfer the gathered 
knowledge into the XML-based standard. This allows the acquired semantic knowledge to be used BIM-based and 
directly updated in future versions of the standards. To show the applicability of the concept an approach is 
presented in which the obtained information is stored and used as a queryable knowledge base. The resulting 
database is used by a querying assistant, in which a user can enter keywords and questions that are translated 
into SPARQL queries to provide answers for the given input. 


KEYWORDS: Natural Language Processing (NLP), NISO-STS (Standard Tag Suite), Smart Standards, Rule- 
based model checking, Semantic knowledge 


1. INTRODUCTION 


In civil engineering, managing and exchanging information is a non-trivial task due to the complex nature of 
construction projects and the involvement of numerous stakeholders (Alani et al., 2021; Tomczak et al., 2022). 
These stakeholders in the Architecture, Engineering, Construction, and Operations (AECO) industry put their 
priority on ensuring compliance with standards, regulatory documents, and other requirements (Beach et al., 2015). 
However, formalizing the Information Requirements (IR) and intricate validation rules poses a significant 
challenge and has hindered the widespread adoption of Building Information Modeling (BIM) practices (Tomczak 
et al., 2022). The current manual compliance checking process is prone to errors, time-consuming, and costly. Due 
to the high effort required for the testing processes, in practice testing is only carried out on a random sample and 
not in its entirety (Z. Zhang et al., 2022, Fauth, 2021). These drawbacks have motivated extensive research into 
Automated Compliance Checking (ACC). 


Acquiring precise IRs and validation rules from guidelines and standards remains a significant obstacle, 
particularly considering that many of these documents exist in non-machine-readable formats (Schénfelder & 
König, 2021). Thus, there is a need for a machine-readable and interchangeable format of representation for 
regulatory documents and standards to extract the desired information for ACC checking. To tackle the challenge 
of not machine-readable standards, the German Institute for Standardization (DIN) has started the Initiative Smart 
Standards (Czarny et al., 2021). Their objective is to convert their published standards into machine-readable 
documents. To fulfill this, they have adopted the NISO Standard Tag Suite (NISO-STS), an XML-based extendable 
data format introduced by the National Information Standards Organization (NISO). This standardized format is 
designed to present and preserve the content and metadata of standards and regulatory documents (NISO Standards 
Tag Suite Working Group, 2017). 


This paper presents a concept for enhancing standards and regulatory documents in the NISO-STS format with IR 
and validation rules. The aim is to establish a machine-interpretable semantic knowledge base that can be 
seamlessly integrated into BIM processes. To achieve this, natural language processing (NLP) methods are 
facilitated to extract relevant semantic information from standards in NISO-STS representation. Furthermore, the 
concept introduces a workflow to transfer the gathered knowledge back into the XML-based standard to link the 
extracted IRs and rules with the original text. This allows the extracted semantic knowledge to be used on a BIM 
basis and updated directly when new versions of the standard are created. To show the applicability within BIM 
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workflows, the paper concludes with a demonstrator that can be used to query the semantic knowledge to obtain 
relevant validation rules and IRs. Overall, this paper aims to address the challenges faced in ACC, information 
management, and integrating digital information into the BIM workflow. 


2. BACKGROUND 


This section presents the key terms, concepts, and relevant research for the concept presented in this paper. Initially, 
the section provides an overview of the Standard Tag Suite developed by the NISO, followed by an introduction 
to the Initiative Smart Standards proposed by the DIN, which employs this tag suite. Following, an introduction 
to NLP-based information extraction and knowledge representation is provided. The chapter concludes with a 
presentation of the current state of research on the topic of code compliance. 


2.1 Digital standards 


As denoted in Section 1 there is a need for machine-readable, interchangeable, and maintainable regulatory 
documents and standards for ACC (Schénfelder & König, 2021). For this purpose the NISO in particular the NISO 
Standards Tag Suite Working Group published the NISO Standards Tag Suite (NISO-STS) in 2017 (NISO 
Standards Tag Suite Working Group, 2017). The NISO-STS defines a set of XML elements and attributes that 
describe the complete content and metadata of standards. This includes co-produced standards and standards 
bodies' adoptions of existing standards, to establish a universal format for publishing and exchanging standards 
content in all shapes. The primary objective of the NISO-STS is to preserve the content of standards, irrespective 
of how they were created and delivered. It enables the acquisition of structural and semantic components without 
being bound to a specific order or textual arrangement. The standard consists of two implementations, referred to 
as the Interchange Tag Set and the Extended Tag Set. These Tag Sets are constructed from the elements and 
attributes defined in the NISO-STS and are designed to function as models for publishing and enhanced 
interoperability of standards and regulatory Documents (NISO Standards Tag Suite Working Group, 2017). 


Within Initiative SMART Standards (IDiS) of the DIN, the concepts and implementations of the NISO-STS are 
used to advance the digitalization of German regulations and standards (Czarny et al., 2021). The IDiS facilitates 
the establishment of digital standards, which offer information necessary for standardization tasks in a suitable 
format and scope. A whitepaper has been developed to foster a common understanding and clear action scenarios 
for implementing digital standards. The document provides a comprehensive understanding of various scenarios 
concerning standards, encompassing aspects such as maturity, readability, feasibility, interpretability, and even the 
potential for machine-driven creation. It also addresses the different levels of autonomy in the creation and 
application of standards and regulatory documents. These levels span from level 0, representing the traditional 
paper-based format, to a potential level 5, depicting a future scenario where standards are directly influenced and 
optimized by artificial intelligence (AI). Currently, the DIN is actively engaged in converting all its rules and 
regulations into a machine-readable XML serialization using the aforementioned NISO-STS model. To effectively 
implement rules for the ACC of building information models, digital standards at level 3 or higher are necessary. 
This indicates that the standards must reach a level of autonomy where they can be interpreted and applied by 
humans and machines (Czarny et al., 2021). 


In this contribution, Autonomy Levels 2 and 4 are utilized. A document in Level 2 is a machine-readable XML 
document and allows the extraction of its textual content and other structural elements for further processing. 
Within the document’s chapters, sentences, graphics, and tables are distinguishable which simplifies a separate 
examination of the individual components. A Level 4 document contains, in comparison to a Level 2 document, 
not only machine-readable but also machine-interpretable content that enables a close linkage with execution and 
application information. These features allow seamless integration of the contained information into other 
information systems and software tools (Czarny et al., 2021). 


2.2 NLP-based information extraction 


In the construction industry, the checking of specifications from building codes, regulations, and standards plays 
a crucial role. Stakeholders involved in a construction project must adhere to precise guidelines in both design and 
realization, and adherence to these guidelines needs to be consistently proven. One way to ensure compliance is 
through ACC of building designs. However, for ACC to work effectively, it requires converting the natural 
language specifications found in regulatory documents into machine-readable constraints (Fuchs & Amor, 2021). 
To achieve this, NLP methods can be facilitated. NLP is a subfield of AI and computer linguistics that focuses on 
the interaction between computer or formula languages and human language. It involves the development of 
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algorithms and language models that enable machines to understand, interpret, generate, and manipulate human 
language effectively (Chowdhary, 2020). NLP utilizes a diverse set of techniques to gain a comprehensive 
understanding of natural language. These techniques have been applied in various studies to facilitate the full 
automation of the code compliance process during information extraction from regulatory documents. 


Schénfelder and König (2021) proposed a Named Entity Recognition (NER) based model that trained German 
building code documents on the pre-trained German corpus BERT (Bidirectional Encoder Representations from 
Transformers). BERT is a language representation model, which is pre-trained on the deep-directional 
representation of unlabeled text. BERT has shown promising results in eleven natural language processing tasks 
such as question answering and language inference (Devlin et al., 2019). Schénfelder and König (2021) used the 
NER technique to label text in the building code as they aimed to train the network based on supervised learning. 
The study demonstrated results of average performance values of 95.7 % precision and 95.2 % recall. The authors 
discussed limitations in the study as the proposed concept could provide good results only to the German corpus 
used. 


Some studies employ another technique besides NER as Zhou et al. (2022), and R. Zhang and El-Gohary (2020). 
Zhou et al. (2022) proposed an approach that deploys NER and Context-Free Grammar (CFG) to formulate a 
generalized rule interpretation framework. The study focused on analyzing regulatory text to create a syntax tree 
representing roles and concepts and developing a deep learning network using transfer learning to label the 
semantic elements in the text. Zhou et al. (2022) presented outcomes with an accuracy of 99.6 % and 91.0 % for 
parsing single- and multi-requirement sentences. The study stated that it focused on quantitative sentences in the 
regulatory documents and that more types of sentences will be addressed. 


In their work, R. Zhang and El-Gohary(2020)introduced a new machine learning-based approach to automatically 
match building-code concepts and relations with their equivalent concepts and relations in the Industry Foundation 
Classes (IFC). The approach was implemented and tested on chapters from the 2009 International Building Code 
(IBC) and the Champaign 2015 IBC Amendments. The preliminary results achieved a semantic matching 
performance of 77 % accuracy for matching building-code concepts to IFC elements and 78 % accuracy for 
matching building-code relations to IFC relations. 


2.3 Knowledge representation 


There are two fundamental concepts for data and knowledge representation, which are ontologies and knowledge 
graphs. Ontologies are used widely in various domains to represent the semantics of a specific domain and provide 
standardized data representation (Ehrlinger & W68, 2016). The Interconnected Data Dictionary Ontology (IDDO), 
developed by Zentgraf et al. (2022), was designed to digitize knowledge from building regulations and 
construction guidelines. It offers a data schema to describe and manage properties in accordance with ISO 23386 
(ISO 23386, 2020). The ontology organizes the digitized knowledge into a hierarchically structured tree of property 
groups and properties, extracted from natural language texts. Its main purpose is to provide an architecture for 
transforming building codes into a structured, knowledge-represented format. Encoded in Web Ontology Language 
(OWL) (Motik et al., 2012), it enables seamless integration and utilization of digitized knowledge in various 
applications. 


Other studies focus on rule formulation from regulatory documents that help the development of ACC frameworks. 
Wessel et al. (2013) proposed an approach with two parts. The first part involves building an ontology to capture 
and represent the standards and information found in regulatory documents. In the second part, the enriched 
ontology is utilized to extract rules. These rules are derived from the information contained within the ontology, 
facilitating the automated extraction and formalization of regulatory guidelines and requirements. The author states 
that there are further improvements they are targeting in the future, which aim to enrich the knowledge base and 
improve automatic reasoning and extraction of rules (Wessel et al., 2013). 


2.4 Code Compliance 


Building codes are a part of the construction work, which aims to ensure the integrity and compliance of the 
planned structure. As the review of building codes is a time-consuming and error-prone process, there are a lot of 
efforts to digitize the process. Accordingly, many research studies use different methodologies to reach the goal of 
tule extraction. Eastman et al. (2009) stated that the process of rule checking is composed of four steps, which are 
(1) rule interpretation and logic structuring ; (2) building model preparation; (3) rule checking, and (4) reporting 
the checking results. 
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Schwabe et al. (2019) proposed an approach that aims to create a model-based rule checking for the planning of 
construction site layouts. The study used the open-source rule engine Drools and Industry Foundation Classes (IFC) 
to extract information from building models and apply rules to the information extracted. Drools is a Business 

Roles Management System (BRMS) based on Java language, which allows users to create decision models that 

are based on rules formulated as when-then statements, to take an action when a condition is true (Browne, 2009). 

Schwabe et al. (2019) stated that they are planning to extend the rule sets and experiments with other rule languages 

in future research. 


In their work, Beach et al. (2015)introduced a method that utilizes the semantic web to achieve a comprehensive 
understanding of regulatory documents. The authors divided the semantic web knowledge into three main concepts, 
where each concept targets a specific knowledge in the regulation document. Additionally, the authors have used 
RASE tags to annotate the regulatory document, which is a markup technique applied to text and it is composed 
of four operators; Requirement, Applicability, Selection, and Exceptions (Hjelseth & Nisbet, 2011). Accordingly, 
Beach et al. (2015) converted the tags into Semantic Web Rule Language (SWRL), which can conceivably detect 
if the regulation is in the scope or not. 


3. CONCEPT 


This paper proposes a concept for the enrichment of digital standards in Level 2 with machine-readable IRs and 
validation rules to create digital standards at autonomy Level 4. The objective is to establish a machine- 
interpretable semantic knowledge base that can be integrated into the BIM methodology to support planning 
processes, ACC, and other BIM practices. To achieve this, an XML-Crawler first processes a digital standard at 
autonomy Level 2 to extract all contained textual information. This textual information is then forwarded into an 
NLP pipeline that extracts all relevant IRs and validation rules from the natural language texts. After the extraction 
the requirements and rules are further processed into suitable data representations and stored in respective 
databases. The stored data is then used to enrich the analyzed standard to make it compliant with autonomy Level 4 
and additional areas of application for the stored data. Additionally, the entire concept is designed to be realized 
using open data formats and interfaces, aligning with the principles of the openBIM concept. 


IR-Checking 


BN 
>” 
Requirements e 
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Rule-Checking 


NISO-STS 


NLP Results 


Fig. 1: Schematic representation of the concept 


The input for this framework is as mentioned above an XML-serialized digital standard at autonomy Level 2. It 
complies with the XML Schema Definitions (XSD) specified by the NISO-STS. At Level 2, the document is 
machine-readable, enabling automated extraction of its structural elements. Granular contents such as chapters, 
sentences, graphics, and tables can be distinguished and extracted from the document (cf. Section 2.1). Moreover, 
the separation between representation and content allows a more streamlined and efficient processing of the 
information (Czarny et al., 2021). 


In the initial step of the developed workflow, the uniform structure provided by the autonomy level is utilized. The 
structure allows the creation of an XML crawler that can efficiently extract all relevant textual information from 
the standard. A document or in this case, XML crawler is a software or script designed to automatically search and 
navigate through documents to extract information from these documents and store it for further processing, 
analysis, or database indexing. Document and XML crawlers are commonly used in search engines and knowledge 
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management systems. The extracted textual data taken from the NISO-STS compliant standard is stored in a plain 
text format for further processing (cf. Fig. 1 (a)). 


Subsequently, algorithms from the field of NLP can be trained using these text-based input data. The aim of the 
conceptualization, implementation, and training of NLP algorithms within the proposed concept is to find two or 
optimally one NLP pipeline to find and extract IRs and validation rules which can be further processed (cf. Fig. 1 
(b)). To enable the training of such NLP pipelines, further preprocessing of the provided textual information may 
be necessary. This preprocessing could involve tasks such as sentence tokenization, lemmatization to revert words 
to their base forms, or conversion of input texts into vectorized representations. 


In the third step, the output generated by the NLP pipeline, which includes the identified IRs and validation rules, 
is further processed (cf. Fig. 1 (c)). The goal is to find suitable data representations for both the IRs and the 
validation rules. This step involves transforming the extracted information into structured and machine- 
interpretable formats, making it more accessible for downstream tasks. To achieve this, databases are created to 
store and manage the extracted IRs and validation rules. These databases are designed to support the creation, 
storage, organization, exchange, and utilization of data in structured and machine-readable formats. Special 
attention is given to utilizing open interfaces to ensure interoperability and flexibility. By leveraging open REST 
(Representational State Transfer) APIs, the databases make the stored information available for seamless 
integration with other systems and applications. 


The last step of the concept is divided into two parts. In the enrichment, IRs and validation rules stored are accessed 
through REST APIs. Leveraging this extracted data, the analyzed standard of Level 2 is enriched with the extracted 
elements to a smart standard in autonomy Level 4. As denoted in section 2.1 a Level 4 standard encompasses 
machine-interpretable content, strongly linked with execution and application information. This capability enables 
direct executability and seamless integration with other relevant information sources (Czarny et al., 2021). By 
incorporating machine-readable features, the standard becomes easily interpretable and executable by machines, 
minimizing the need for manual intervention. This automation enhances efficiency and precision in processes 
reliant on the standard. Furthermore, the integration of execution and application information allows the 
automation of specified actions and enables interactions between interconnected systems and data sources. With 
this enriched autonomy, the standard gains agility in handling dynamic tasks and adapting to changing scenarios. 
ACC and other complex procedures can be executed with greater effectiveness and consistency, supporting 
workflows, and enhancing data interoperability. 


The second part of the final step focuses on other areas of application of the information extracted from the 
standards. Fig. 1 (d) illustrates an exemplary area of application of the automatic validation of IRs and validation 
tules directly at a BIM model, both formally and technically. This type of validation could be integrated into the 
lifecycle of a construction project during a digital building permit review process. There are several other potential 
applications for the provided information. For instance, it could be utilized to define the Level of Information Need 
(LOIN) during the tendering process of buildings or to formulate general modeling guidelines for construction 
projects. Moreover, it can also support the creation and versioning of new or existing standards, facilitating a more 
streamlined and efficient standardization process. 


By offering different possibilities for leveraging the extracted information requirements and validation rules, this 
shows the relevance and impact of the NLP-based analysis of standards and regulatory documents in the AECO 
domains. It enables stakeholders in the construction industry to implement automated validation processes, more 
streamlined tendering procedures, and maintain consistent modeling practices, leading to improved efficiency and 
enhanced collaboration throughout the entire construction lifecycle. 


4. USE CASE 


We aim to develop an approach to convert the knowledge in regulatory documents into a machine-interpretable 
representation. As the regulatory documents are mostly not computer processable, the purpose is to represent the 
knowledge in the regulatory document in a knowledge base, which is computer interpretable. Accordingly, the 
next step is that we apply the rules based on the retrieved data as illustrated in Fig. 1 on a regulatory document. 


The regulatory documents used in this demonstrator are from the Research Society for Roads and Traffic (FGSV) 
(FGSV Verlag GmbH, 2023), which creates the technical regulations for the entire road and traffic system in 
Germany. The regulation’s language is German, but the concept has broad applicability and can be extended to 
any other language. The FGSV has multiple regulations in this area, while we focused on FGSV 499 — RStO12 
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(Forschungsgesellschaft fiir StraBen- und Verkehrswesen, 2012). The proposed use case places particular emphasis 
on one chapter of the mentioned regulatory document. The chapter encompasses introductory text, definitions, 
tables, and interrelated constraints. The approach of knowledge extraction required extensive and comprehensive 
reading and understanding of the document, to extract the correlations and interrelationships in the text. The 
knowledge acquisition is performed manually by highlighting the logical sentences, descriptive texts, and the 
relationships between the tabulated data and the plain texts. 


4.1 Data Preparation 


The subsequent action entails gathering the knowledge extracted in a machine-readable format, to be able to 
formulate rules upon the acquired information. The representation of knowledge is based on the semantic web by 
employing the OWL and the Resource Description Framework (RDF). The ontology hierarchy and relationships 
between classes is a complex stage that requires a significant amount of attention to achieve the accurate 
formulation of knowledge extracted. The software used to create the ontology is Protegé (Musen, 2015). The 
acquired knowledge from the regulatory document is centered around the construction of roadways. The regulatory 
document encompasses different classes of soils and the requirements for the subsoil or substructure according to 
frost sensitivity or other constraints. The soil classes have a minimum thickness delegated to each class and this 
thickness could be increased or decreased whenever exposed to local conditions, for instance depending on the 
zone, underground water conditions, or drainage of the roadway. Each local condition possesses a value, which 
has a positive or negative sign, to raise or lower the thickness of the soil class. 


Table 1: Increased or reduced thicknesses due to local conditions (translated (Forschungsgesellschaft fiir StraBen- 


und Verkehrswesen, 2012)) 


Local conditions A B C D E 
Frost effect Zone I +0 cm 
Zone II +5 cm 
Zone III +15 cm 
small-scale Climate unfavorable climatic influences, e.g., due to North- +5 cm 
differences facing slopes or in ridge locations of mountains 
no special climatic influences +0 cm 
favorable climatic influences with closed lateral -5 cm 


development along the street 


Water conditions in the No groundwater and stratum water down to a depth of +0 cm 
subsoil 1.5 m below the subgrade level 
Groundwater or stratum water permanently or +5 cm 


temporarily higher than 1.5 m below ground level 


Location of the gradient Incision, gating +5 cm 
Terrain height up to 2.0 m +0 cm 
Dam > 2.0 m -5 cm 
Drainage of the Drainage of the roadway via swales, ditches, or the +0 cm 
roadway/execution of embankments 
the edge width 
Drainage of the roadway and peripheral areas via -5 cm 


gutters or drains and pipelines 
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Furthermore, the regulatory document comprises a section on asphalt base courses, where tabulated data shows 
different types of base layers without binders and each type has a thickness depending on the load-bearing capacity 
(cf. Table 1).The document encompasses abundant data about how to construct the roadways and the constraints 
encountered, which affect the substructure or superstructure. The stated knowledge is represented as an ontology 
through different classes and properties. To ensure the ontology's coherence and compatibility, it is structured 
based on the IDDO (Zentgraf et al., 2022), which provides an architecture for transferring building codes into a 
structured format based on the data schema of ISO 23386 (ISO 23386, 2020). As a result, the knowledge is well 
structured through classes and properties in a standardized way. 


4.2 Information retrieval 


As a prerequisite, it is assumed that the conversion of the regulatory document into a machine-readable format 
was achieved successfully. The prevailing stage is to retrieve the information stored in the knowledge base in order 
to be able to apply rules to the retrieved data. The chosen retrieval method is the SPARQL Protocol and RDF 
Query Language. SPARQL is selected due to its ease of use and applicability to manipulate RDF data, which 
supports the proposed framework. The ontology is established and structured using Protegé. To enhance the 
concept's capabilities and prepare for the subsequent step of formulating rules, the queries are interconnected with 
the RDF data. The querying stage starts with trials to ensure the functionality of the ontology, whether the data is 
retrieved correctly or not. The objective is to retrieve the classes, properties, and annotations from the ontology. 
We used to identify the vocabulary from the ontology precisely as SPARQL queries are sensitive and comply with 
the class naming pattern in the ontology. As a case in point, to begin with, we utilized the first part of the regulatory 
document, which aims to find the minimum thickness for soil classes. The soil classes are F7, F2, and F3, where 
each one has a relationship to a local condition. The main class in the ontology, which stores the soil classes is 
named Frostempfindlichkeitsklasse, thus we call the main class in the queries, to get the subclasses, annotations, 
and relationships assigned to it. The objective is to retrieve the thickness of the soil class, as well as the thickness 
assigned to the local condition. As a result, the total thickness of the soil class could be calculated, and the final 
thickness is the sum of the soil thickness and the local condition thickness variable. The final thickness calculation 
will not be executed through SPARQL queries. This step will be calculated during the formulation of the validation 
rule instead. 


?frostklasse rdfs:subClassOf ont:{class_name} . 

?klasse rdf:type/rdfs:subClassOf* ?frostklasse . 

?klasse ont:hasThickness ?Dicke . 

?klasse ont:BounderyValue ?Constraint . 

?Constraint ont:hasThickness ?ConstraintDicke . 

?GroupOfProperties ont:DateOfCreation ?DateOfCreation . 

FILTER (regex(str(?klasse), “{klass}", “i") && regex(str(?Constraint), “{condition}", “i")) 


}} 


GROUP BY ?frostklasse ?Dicke ?klasse ?Constraint ?ConstraintDicke >DateOfCreation 


Fig. 2: Excerpt of an example query for detecting the Soil class, the Thickness, the Local condition, and the Local condition 


As an example, Fig. 2 shows an excerpt of a query formulated for Rule 2, which detects the Frostklasse (Soil class), 
Dicke (Thickness), Constraint (Local condition), and Constraint Dicke (Local condition thickness). The same 
pattern of query formulation is employed for other knowledge data stored in the ontology, taking into account the 
distinctions in parameters. 


4.3 Rule Checking 


Our purpose is to build a framework that consists of three components, which are ontology, SPARQL queries, and 
the Python programming language for conducting additional analysis on the retrieved data and rule development. 
The rule’s structure consists of a condition part if statement and the outcome part Then and Else statements. Thus, 
the rules are formulated to execute the results based on specific conditions. 


For instance, a rule logic is if the user selects a class named X, and if the user selects a local condition named Y, 
then the data for the assigned class based on the local condition selected by the user will be retrieved. Accordingly, 
some actions will be executed and applied to the retrieved data. For example, the aim is to calculate the final 
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thickness of the subgrade, where the thickness depends on the minimum thickness. This is specified depending on 
the soil class and local conditions, to increase or decrease the total thickness of the substructure. As a result, the 
rules calculate the final thickness of the substructure based on the class selected by the user and the local condition 
selected as well. One of the significant aspects taken into account is that not all the data can be retrieved from the 
knowledge base with only one SPARQL query, thus every rule formulated has its query. Three rules have been 
derived from the knowledge extracted from the regulation document. 


4.4 User Interface 


In the upcoming step, a user interface is implemented, which uses the created validation rules and the converted 
excerpt from the regulatory document. The aim is to allow the user to interact with the regulatory document through 
input provision and rule selection. We developed a user interface, which prompts the user for input, such as 
keywords, filters, or specific entities directly into the interface. Accordingly, the system uses the input to construct 
a SPARQL query to retrieve the relevant data in the ontology. Furthermore, the interface provides users with a list 
of pre-defined descriptive rules, each designed to process data based on a certain logic. Additionally, once the user 
selects a rule, the interface provides an instruction statement, which guides the user on what the system needs to 
process the data as illustrated in Fig. 3. 


Rule Selection 


Input the asserted area targeting 


Select Ruleset 


IF ( RulesetSelected ="Die Bestimmung der 
Mindestdicke des frostsicheren Oberbaus ) 


IF ( Type OfClass = F2 && TypeOfConstraint = Zone2) 


What are you looking for in FGSV 499-RStO 12? 


Available Rules 


Dicke: 50 

Klasse. FIBK3 2b BK1,0 
Constraint: Zone? 
‘Constraint Dicke: 5 
Gesamt Dicke ist —> 55 


Date of creation: 2023-07-1IT11:14:40 


Fig. 4: Results based on the class name specified and the selected rule 
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The integration of SPARQL and Python rules delivers a comprehensive solution to interact with the knowledge 
base and perform rule-checking. This process enables the users to interact with the ontology. Fig. 4 shows the 
prototype of the implemented user interface. 


5. CONCLUSION 


This contribution provides a concept to enhance digital standards in Level 2 by incorporating machine-readable 
IRs and validation rules to advance these digital standards into autonomy Level 4. The concept uses an XML 
document crawler and NLP algorithms to analyze the textual information of an examined standard. The extracted 
IRs and validation rules are converted and stored in semantic knowledgebases and are made available via open 
RESTful APIs. With these open interfaces, the rules and requirements can be used to transform the considered 
standard into a machine-interpretable standard of autonomy Level 4. Furthermore, the gathered rules and 
requirements can be used in other areas of application within the BIM methodology (cf. Section 3). 


One of these application areas is presented in detail in the use case (cf. Section 4). An approach is presented in 
which the obtained information can be used as a queryable knowledge base. For this purpose, it is assumed that a 
standard was processed by the NLP algorithm in advance and that the results are available for further processing. 
In the next step, the obtained information is structured according to the IDDO ontology and stored in a graph 
database. The resulting database is used as a semantic knowledge base for a querying assistant in the following. 
Within the assistant, the user can input a fixed set of inputs to query the knowledge base. The entered keywords 
and questions are translated into SPARQL queries which search the knowledge base to provide the user with an 
answer to the given input. The presented use case shows the feasibility of the presented concept with a restricted 
set of possible filters and questions. In future research, it can be considered to extend pre-trained networks, like 
ChatGPT, by incorporating extracted information from standards and regulatory documents. 


Current parts of the concept have already been implemented while others need to be addressed in future work. In 
their work, Kandt and Zentgraf (2023) presented the implementation of an XML-Crawler for NISO-STS-compliant 
standards. With the outcome of this contribution, the process step shown in Fig. 1 (a) can be realized. The 
aforementioned IDDO ontology published by (Zentgraf et al., 2022) can be used to create a graph database for 
information requirements (cf. Fig. 1 (c)). Apart from outlining the data schema, the paper also introduces a method 
demonstrating how IDDO can ensure the accuracy of information requirements through the application of Shapes 
Constraint Language (SHACL) shapes. 


In future research, several areas need to be addressed to realize the whole of the presented concept. Firstly, the 
identification of suitable NLP algorithms for extracting information requirements and validation rules is important. 
In the following step, it is necessary to define and establish a dedicated database to store, manage, and maintain 
the extracted validation rules. Building upon this, methods must be developed to automate the enrichment of a 
Level 2 standard using the extracted information requirements and validation rules, in order to obtain a Level 4 
standard. Additionally, potential application areas within the building lifecycle need to be explored. Possible areas 
where the extracted information could be utilized effectively include Facility Management, refurbishments in 
combination with sustainability assessments, and other areas. 
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ABSTRACT: This paper discusses the potential use of AI in general, and large language models (LLMs) in 
particular, to support knowledge management (KM) in the building industry. The application of conventional 
methods and tools for KM in the building industry is currently limited due to the large variability of buildings, and 
the industry’ fragmentation. Instead, relatively labor-intensive methods need to be employed to curate the 
knowledge gained in previous projects and make it accessible for use in future projects. The recent development 
of LLMs has the potential to develop new approaches to KM in the building industry. These may include querying 
a variety of relatively unstructured documents from previous projects and other textual sources of technical 
expertise, processing these data to create knowledge, identifying patterns, and storing knowledge for future use. A 
proposed framework is defined for the use of LLMs for KM in construction. We will perform preliminary analyses 
on how to train models that can generate information and knowledge required to make decisions in the 
development of specific tasks of fire safety planning. 


KEYWORDS: Large Language Models (LLMs), Knowledge Management (KM), Fire Safety Planning, Expert 
Systems (ESs), Artificial Intelligence (AI), Knowledge Graph, Ontology. 


1. INTRODUCTION 


In the building industry, ensuring fire safety is of paramount importance. Effective fire safety planning plays a 
crucial role in mitigating risks, protecting occupants, and minimizing property damage (Kodur et al., 2019). 
However, the complex and dynamic nature of the building industry poses unique challenges in the realm of fire 
safety planning such as high fuel (fire) load, improper use of the materials, use of new construction materials with 
poor fire performance or longer response times for firefighting (Kodur et al., 2019; Parsamehr et al., 2023). 
Conventional methods and tools for knowledge management (KM) have struggled to adequately address these 
challenges due to the industry's inherent variability and fragmentation (Nikolic & Dakic, 2015). 


Traditionally, fire safety planning has been based on expert knowledge (Maiellaro, 1997; Law & Spinardi, 2021). 
To curate and leverage the knowledge gained from previous projects for future endeavors, the building industry 
has relied on labor-intensive approaches. These approaches involve manually extracting and organizing 
information from disparate sources, making it time-consuming and resource-intensive (Liu, 1995). Consequently, 
the development and implementation of efficient fire safety planning strategies are hindered, hampering the 
industry's overall progress. 


Expert systems (ESs) have been developed to capture and codify this expert knowledge, making it more accessible 
to fire safety professionals. However, ESs have a number of limitations, including the fact that they are difficult to 
maintain and update, and they can be inflexible in dealing with new or unforeseen situations (Tofito et al., 2013). 


Nevertheless, recent advancements in the field of Artificial Intelligence (AI), particularly in large language models 
(LLMs), offer new opportunities for overcoming knowledge management issues in the building industry. LLMs 
are a type of artificial intelligence (AJ) that are trained on massive datasets of text and code (Shanahan, 2023). 
This allows them to learn the relationships between concepts and to generate text that is both informative and 
comprehensive. LLMs, such as OpenAI's GPT-3.5, have the potential to revolutionize the way knowledge is 
accessed, processed, and utilized. 


The utilization of LLMs in fire safety planning offers the potential benefits of accessing and processing extensive 
data from previous projects, identifying patterns and trends in fire safety data, generating novel knowledge and 
insights, and adapting to new or unforeseen situations. These advancements can significantly improve decision- 
making processes and enhance the overall effectiveness of fire safety planning in the building industry. 


Therefore, the primary goal of this paper is to investigate and assess the potential of LLMs in the context of fire 
safety planning. In pursuit of this objective, the current state of the art in expert systems (ESs) employed for fire 
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safety planning will be conducted and LLMs will be introduced as an innovative approach to help define scenarios 
to ESs. Finally, the research will demonstrate the preliminary test results and conclude with future research 
directions in order to realize the full potential of LLMs for fire safety planning. 


2. THE CHARACTERISTICS OF ESS AND LARGE LANGUAGE MODELS (LLMS) 


AI systems process and analyze large datasets with the view of identifying patterns, relationships, drawing 
inferences, recommendations and taking action. With the advancement in AI, conversational AI came of age in 
2010 to deal with the application of Natural Language Processing (NLP) to enable computers to interact with 
humans in a conversational way using natural language. The majority of the developed conversational AI agents 
in the AEC industry are based on the traditional approach to NLP, which requires time for processing the data, and 
users’ interactions are often restricted as the agents are developed with the assumption of happy path users (Saka 
A. et al., 2023). Large Language Models (LLMs) are neural networks with large parameters and are trained using 
self-supervised learning and semi-supervised learning on large datasets. LLMs have improved NLP and shifted 
the direction away from training with labeled data for defined objectives. Generative Pre-trained Transformer 
(GPT) models which are decoder blocks only from OpenAI have gained significant attention and showed improved 
performance from GPT-2 (trained with 10 billion tokens) until the latest GPT-4 released in 2023. GPT models use 
transformer-based models that learn statistical patterns of natural language, enabling them to generate human-like 
language. One of the main advantages of GPT models is their capacity to produce language that is cohesive, fluent, 
and nearly indistinguishable from any text produced by humans. These models have been effectively used in a 
variety of applications, including chatbots, content generation, and machine translation. They can produce answers 
to open-ended questions, making them an important tool for natural language communication. 


Not only have the communication and inference abilities demonstrated to emerge naturally in large language 
models, but even dedicated experiments have shown that chain-of-thought (COT) prompting improves 
performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. Indeed, one’s own thought 
process when solving a complicated reasoning task such as a multi-step math word problem entails decomposing 
that problem into intermediate steps and solving each of them before giving the final answer. A research endowed 
language models with the ability to generate a chain of thought, i.e. a coherent series of intermediate reasoning 
steps that lead to the final answer for a problem. It was reported that sufficiently large language models can 
generate chains of thought if demonstrations of chain-of-thought reasoning are provided in the examples for few- 
shot prompting (Wei J. et al., 2022). The motivations that led to the development of the chain-of-thought prompting 
method match the main goals that must be pursued in the field of knowledge management in construction. First, 
the chain of thought, in principle, allows models to decompose multi-step problems into intermediate steps, which 
means that additional computation can be allocated to problems that require more reasoning steps. Second, a chain 
of thought provides an interpretable window into the behavior of the model, suggesting how it might have arrived 
at a particular answer and providing opportunities to debug where the reasoning path went wrong. Third, chain- 
of-thought reasoning can be used for tasks such as math word problems, commonsense reasoning, and symbolic 
manipulation, and is potentially applicable (at least in principle) to any task that humans can solve via language. 
Finally, chain-of-thought reasoning can be readily elicited in sufficiently large off-the-shelf language models 
simply by including examples of chain-of-thought sequences into the examples of few-shot prompting (Wei J. et 
al., 2022). Among the main findings from this study, we would like to stress that chain-of-thought prompting does 
not positively impact performance for small models, rather it has larger performance gains for more-complicated 
problems. In addition, chain-of-thought prompting via GPT-3 compares favorably to prior state of the art, which 
typically finetunes a task-specific model on a labeled training dataset. 


Although the construction industry is information-intensive and relies on myriad and diverse information from 
different stakeholders, there is a lack of information integration, reuse, and efficient management, all of which 
have an impact on the productivity of the industry. In other words, we must find out how to represent large amounts 
of diverse knowledge in a fashion that permits their effective use and interaction (Goldstein I, & Papert S., 1977). 
An expert system is a computer program whose performance is guided by specific, expert knowledge in solving 
problems. The problem-solving focus is crucial in this characterization because there must be the knowledge of 
central interest that can guide the search for solutions. The word “expert” implies narrow specialization (or focus) 
and substantial competence. It is intended to solve problems that are otherwise solved by people having dedicated 
training and exceptional skills. Thus, the standard of performance for expert systems is in human terms, by 
comparison with people carrying out a particular kind of task (Stefik M., 2014). Nowadays, expert systems are 
being used in a wide range of different interactive roles, such as smart spreadsheets, financial advisors, planning 
assistants, and cognitive coprocessors. In whatever role we employ expert systems, those systems require 
knowledge to be competent, the so-called knowledge base. Such knowledge must be elicited through effective 
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techniques from multiple human experts. Then, techniques to process such knowledge should be adopted to work 
out a robust, reliable, and flexible expert system (Fekri-Ershad, S., 2013). Even in the most successful applications 
where expert systems outperform human experts in their reliability and consistency of results, expert systems have 
less breadth and flexibility than human experts. In this context, LLMs can help enhance the flexibility and 
reliability of expert systems, thanks to their ability to manage large amounts of information, even if it is not 
structured such as written documents, technical standards and regulations. GPT models are trained using vast 
amounts of unstructured text data, enabling them to generate language almost indistinguishable from human- 
generated text (OpenAI, 2023). The synoptic chart included below as Table 1, discusses how LLMs can be a 
potential tool for improving knowledge management in the building industry. LLMs can help widen the scope of 
expert systems thanks to their ability to scrape information from any (even) unstructured knowledge sources as 
soon as it is made available. Hence, they can increase and update the knowledge base of an expert system and set 
up decisional processes that integrate advanced paradigms. 


Table 1: Leveraging LLMs to enhance decision-making in combination with expert systems 


Expert Systems Large Language Models 


Goal Supporting decision-making with a specific Eliciting additional information to broaden out 


task the expert system 
Input Quantitative and structured data Any type of (even) unstructured data 
ouch Combination of logical rules and a Deriving statistical patterns from evidence and 
PP knowledge base to make inferences performing reinforcement learning 
Method | Symbolic AI Transformers 


Answers to questions in human-like language / 


Interface | Digital Chame kouchi 


Recommendations and advice for specific Arguing scenarios representing the dynamics of 


Output A 
scenarios complex systems 


3. DISCUSSION OF GAPS AND COMPLEMENTARY ASPECTS 


Few research works have applied GPT models in construction case studies. They mainly are concerned with 
information retrieval from BIM models and scheduling and sequencing tasks. These investigations helped find out 
inherent limitations, and they paved the way to an extended research review, that investigated opportunities and 
challenges about the practicality of GPT models in the construction industry (Saka A., 2023). These opportunities 
are categorized into different phases of the construction project lifecycle: predesign, design, construction, 
operation, and demolition. 


In the pre-design phase, GPT models incorporated into predesign processes can help simplify decision-making, 
enhance communication among stakeholders, and speed up the discovery of design restrictions and possibilities. 
Similarly, they may support decisions with their powerful natural language processing and machine learning 
capabilities of evaluating data, delivering insightful suggestions, and promoting more efficient cooperation among 
stakeholders through real-time information and predictive analysis. 


In the design phase, GPT models can automate regulatory compliance checks, reducing errors, and streamlining 
the design process. Other representative applications could be quantity take-off and costing (because GPT models 
can be leveraged by providing textual data for the model with necessary cost databases and estimation methods). 
If provided with information on standards, regulations, passive design principles, and renewable energy systems, 
GPT models can be leveraged for improving energy efficiency analysis. 


In the construction phase, GPT models can capture and interpret textual information, enabling a more 
comprehensive understanding of construction project scheduling and logistics by means of a more flexible and 
intuitive approach than mathematical modeling. Other applications include dynamic risk identification and 
assessment by scrutinizing large volumes of project documents and historical data; enhancing progress monitoring 
and reporting by analyzing textual project updates and reports; site safety management; resource allocation and 
optimization; planning and organizing inspection and testing activities; claim and dispute resolution. 


In the operation and maintenance phase, GPT can assist in predicting energy demand patterns, allowing for better 
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planning and allocation of energy resources and providing customized recommendations. They could predict an 
asset's remaining useful life, predicting maintenance and repairs. They can be trained with relevant compliance 
documents, guidelines, and reporting requirements to capture patterns and embedded knowledge. These models 
can be used in compliance evaluation of facility management activities. Other applications might be space 
allocation for the best usage in a facility; generating sustainability reports; analyzing waste-related data. 


In the demolition phase, GPT models can process large volumes of data, in order to: determine optimal demolition 
sequences, and recommend appropriate safety measures; optimize waste sorting and identify recyclable materials; 
offer the prospect of redeveloping and repurposing the site; analyze various risk factors and provide more accurate 
and objective assessments by leveraging their advanced natural language processing (NLP) and machine learning 
(ML) capabilities. 


The analysis performed above suggests that some shared challenges must be faced: 


1. Models are prone to hallucination, i.e. they give sound and plausible information that is not true, which 
reduces system performance and users’ expectations. 


2. Mainly structured data are usually needed in the fine-tuning of GPT models which are often not readily 
available in the construction industry; besides, availability and quality of data have been a major challenge 
for the application of artificial intelligence in the industry. 


3. Although GPT models are large language models and trained on large data sets, their ability to understand 
domain-specific knowledge is limited. As such, there is a need for adequate fine-tuning of GPT models 
and the provision of context to improve their performance in a technical domain like the construction 
industry. Similarly, the regulations in the construction industry are many and vary over time. 


4. GPT models are trained on large datasets. In the construction industry, data such as project design, cost, 
contracts, and schedules could be used as inputs, raising the concern of confidentiality of GPT models 
generating output with sensitive information. 


5. There is often resistance to change in the industry. With the growing application of AI, industry 
practitioners and stakeholders are skeptical about trusting and accepting it. 


6. Liability and accountability challenges are due to the training on a large amount of data. Bias, incomplete 
information and inaccuracies in the training data would affect the output generated by the models. 


7. Deployment of GPT models in the construction industry requires new skill sets called “prompt 
engineering” and training programs for professionals. There are several techniques available for prompt 
engineering such as zero-shot (i.e. no examples are provided for the models to perform specified tasks), 
few-shot (i.e. provide contexts and examples in the prompts for the model) and chain-of-thought (CoT), 
i.e. prompting includes a series of intermediate reasoning steps. 


In this paper, preliminary experiments about fine-tuning of GPT models for construction applications will be 
showcased. One of the objectives is to assess whether fine-tuned GPT models can accomplish a range of tasks 
with greater accuracy. Pre-training may involve unsupervised learning without labels or annotations. After pre- 
training, the model can be adjusted for a variety of tasks to increase the quality and accuracy of the text that is 
produced for that activity, such as language modeling, text categorization, or question-answering. Outcomes of 
our work include preliminary advice about how knowledge sources must be arranged and how queries can be 
prompted. 


4. POTENTIAL FOR INTEGRATION 


This paper discusses the hypothesis that the domain models on which ESs are based can be enriched and expanded 
through the analysis of relevant scenarios with the help of LLMs, by revealing variables and parameters that are 
relevant but are currently either excluded from the ES or incorrectly defined in it due to the modeler’s bias. Four 
components are essential to apply such an approach: scenarios, a LLM, an ES and a validation process. 


Scenarios: the first step is to identify relevant scenarios that represent real-world situations within the domain of 
the ES. In this case, the relevant domain is fire safety, and scenarios of fire events are therefore collected. These 
scenarios should cover a wide range of situations to ensure the system can handle various cases (i.e., different 
types of fire events). Of particular interest are scenarios that may not occur frequently, but that are still important 
to handle, as they may reveal aspects that will need to be incorporated into the model of the domain. 


LLMS: LLMs can be used to identify patterns, trends, and correlations within the scenarios, thus enriching and 
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expanding the domain model. The process of applying a LLM should be iterative, so that the model can 
continuously be refined as new scenarios are identified or as the domain evolves. By querying the LLM, different 
aspects of a scenario can be explored. Each query can build on knowledge gained from the previous queries, 
allowing the analyst to gradually gain a deeper understanding of the domain. LLMs can challenge existing 
assumptions and preconceptions, by revealing factors that might have been previously overlooked. They may thus 
help in avoiding confirmation bias and encourage a more open-minded analysis. Scenarios involving complex 
interactions between different parameters might be better understood by querying the LLM and exploring how the 
different factors influence each other. In addition, the analyst can vary certain parameters in the queries to 
understand how the changes might impact the scenario's outcomes, performing what may be comparable to a 
sensitivity analysis. 


ESs: By repeatedly engaging with a LLM through queries to analyze specific scenarios, the analyst may uncover 
certain aspects that are otherwise ignored, or that are not well understood. By incorporating those aspects in an 
existing structured knowledge graph of the ES, the accuracy of the recommendations provided by the ES for such 
scenarios is increased. This is done by converting the information that is gained into explicit logical statements or 
tule sets to address varying situations, in a way that represents the real-life decision-making process. The logical 
statements should be structured in a format suitable for their integration into the existing knowledge graph of the 
expert system, for example by creating new data fields or modifying existing ones to accommodate the additional 
information. To account for complex relationships between data in the knowledge graphs, hierarchies of rules can 
be established, where higher-level rules encompass general scenarios, and lower-level rules handle exceptions and 
specific cases. This hierarchical structure may support a more detailed and accurate decision-making process. 


Validation: The updated knowledge graph of the ES, and the logical statements on which it is based, should be 
validated by consulting with domain experts, to ensure their accuracy and completeness. Once the enriched 
knowledge graph is integrated into the existing ES, the system should be thoroughly tested with both previously 
used and new scenarios to ensure that it performs accurately and reliably. The iterative nature of the process of 
continuous improvement ensures that the system stays up-to-date and relevant as it learns from new data and 
insights over time. 


To summarize, by enriching and expanding the knowledge graph of an ES through the analysis of relevant 
scenarios with a LLM, the system can become more robust and capable of supporting informed decision-making 
in complex real-world situations. By incorporating the aspects uncovered through the querying of a LLM, and 
defining relevant logical statements, the ES can improve the accuracy of its recommendations for a broader range 
of scenarios. 


5. PROPOSED APPROACH FOR THE USE OF LLMS TO HELP DEFINE 
SCENARIOS FOR ESS 


In the first stage, a LLM is queried that has embedded documents describing scenarios. In GPT (Generative Pre- 
trained Transformer) models, relevant documents can be embedded to help the model better understand context, 
and thus improve the accuracy of answers to queries. The embedded documents can be used to fine-tune the GPT 
model for a specific task. By linking the documents with a query in a single prompt, a unified input is created for 
the model, allowing it to use the context from the documents to provide more accurate answers to the given queries. 
Naturally, the quality and relevance of the documents used in the prompt will play a significant role in determining 
the accuracy of the answers provided, and their selection is therefore crucial. 


In the second stage, the answers provided by the LLM are used to enhance the ES. This can be done directly in the 
ES’s knowledge graph, or indirectly through an ontology on which the knowledge graph is based (Figure 1). 


The ES’s knowledge graphs can be enhanced directly with a set of knowledge units gained from the scenarios by 
querying the LLM. To do so, the LLMs answers, which are provided in the form of natural language text, must be 
converted into a structured format that is suitable for their incorporation into the knowledge graph. Typically, 
entities, relationships, and attributes are extracted from the text, and converted into corresponding nodes, edges, 
and properties in the knowledge graph, while maintaining the graph’s integrity. Depending on the knowledge units 
that are extracted, existing nodes in the knowledge graph may need to be updated or merged, or new nodes added. 
Preferably, the newly added knowledge units are linked with the relevant scenarios on which they are based, to 
maintain traceability and support additional future updates. 


Alternatively, the ES’s knowledge graph can be enhanced indirectly through an ontology on which it is based. To 
do so, a semantic ontology needs to be defined and used to create a conceptual bridge between the two systems. 
This ontology is on the one hand used to define the ES’s knowledge graph and is on the other hand repeatedly 
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updated based on the information gained through the interaction with the LLM. The knowledge units, acquired by 
querying the LLM, are used to define new classes, properties, and relationships in the ontology, thus extending the 
ontology to better represent the domain. Following this, the ES’s knowledge graph is updated based on the 
extended ontology. Preferably, version control is implemented for the ontology to support the tracking of changes 
and management of updates. By repeatedly applying the previously described process, the ES can become over 
time capable of handling a broader range of scenarios with greater accuracy and relevance. 
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Figure 1: Enhancement of an Expert System using a Large Language Model 


6. CHOICE OF THE APPLICATION DOMAIN 


The application domain of fire safety planning was chosen for its paramount significance in the building industry 
and the intricate challenges it poses in safeguarding human life and property from the devastating impacts of fire 
incidents. Fire incidents in buildings can lead to devastating consequences, including injuries, fatalities, and 
significant property damage. 


By leveraging Large Language Models (LLMs) to systematically analyze reports of real-life fire events, pertinent 
insights into the factors influencing fire incidents and the efficacy of existing fire safety measures can be extracted. 
The LLMs provide natural language responses, which can be converted into structured knowledge units suitable 
for integration into the Expert Systems (ESs) knowledge graph. 


Querying reports of fire events and subsequently expanding the ontology based on these queries aligns coherently 
with the standards set forth by the National Fire Protection Association (NFPA), particularly in Chapter 5, clause 
5.5.3. (National Fire Protection Association, 2017). The NFPA emphasizes the criticality of selecting pertinent 
scenarios, both from predefined options and those deemed relevant by designers, to ensure a comprehensive 
approach to fire safety planning. 


The process of querying, extracting insights, and expanding the ontology facilitates the inclusion of diverse 
scenarios, encompassing not only common cases but also rare and significant events. The adherence to NFPA 
standards ensures that the fire safety planning process remains thorough and adaptive, contributing to informed 
decision-making and improved fire safety strategies in building environments. Consequently, this integration 
demonstrates the potential to advance knowledge management in the fire safety domain, elevating the overall 
efficacy of fire safety planning measures. 


In the upcoming chapter, a series of tests will be conducted using selected case studies as examples such as, NFPA 
case study of nightclub fires (Duval, R.F., 2006): 
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e Rhythm Club, Natchez/Mississippi 
e Cocoanut Grove, Boston/ Massachusetts 


These case studies, derived from published reports, have been carefully chosen to represent diverse fire safety 
scenarios encountered in real-world building environments. The adherence to NFPA standards ensures the 
appropriateness and validity of utilizing published reports for the case studies. These tests will provide valuable 
insights into the efficacy of the proposed approach, showcasing its potential to enhance knowledge management 
in fire safety planning and inform more informed decision-making processes. 


7. IMPLEMENTATION 


To expand the relevant ontology, a modular and contextually aware approach is devised to querying a knowledge 
corpus for “low temperature” knowledge of a predetermined format, as represented in the ontology. For this, the 
following components are used. 


7.1 Agent-based Methodology 


To support the querying of the said corpus, agents will be created and utilized. When creating an LLM-based agent, 
understanding the nature of the action, or “step” executed by the agent is paramount. The steps can be categorized 
into: 


e User-guided: A user aligns the agent with the target by adding specific information. 
e Pre-prompted: The system operates with a generic instruction set to achieve the desired outcome. 
e Autonomous: The agent compiles a list of tasks before engaging in user-guided or pre-prompted tasks. 


Each of these steps necessitates a mechanism for handling context, as all prompts require appropriate context to 
yield optimal results. The context influences the stochastic and statistical process of token generation. In this 
article, we have focused on one step, node querying, which is a pre-prompted step. All steps must be accompanied 
by the right context. The essence of context handling lies in the creation, storage, and injection of the right bit of 
context in the proper format (length, form, keywords, wrapping), providing an agent with a solid foundation for 
action. This mechanism embodies an object-oriented, modular scheme where pre-prompts and context are treated 
as retrievable objects for injection. These objects are stored in a tree graph, and their predefined format can be 
utilized for stable injection. Context objects must be designed according to the types of steps and the theme of the 
project. Each theme presents different objects and relationships, demanding specific syntax to harness hidden 
semantic information. 


The creation of context, being paramount, can be realized through two processes: 


e Demanding a Specific Output Format: As part of the agent's response, a certain output format is required 
for the context object. 


e External Ingestion of Information: Utilizing an external mechanism to ingest existing information and 
represent it in a context object. 


As the methodology suggests, the step presented here could be used to continuously enlarge the scope of querying 
or enable working in parallel on other tasks while offering the right context for them. 


7.2 Prompting 


To establish the correct behavior for the agent, a system prompt is created first, based on the ontology description 
and a leading “persona” statement (a role with which the agent’s behavior will be aligned). 


Example: 


You are a Fire Evacuation Specialist working with an ontology. 


The evacuation of buildings in fire emergency situations is a problem for 
which solutions must be found that can help the occupants of these spaces, 
guiding them along their route until they are safe. The purpose of the 
ontology is to build a knowledge Graph that can contribute to a better 
understanding of this topic, as well as to the development of solutions or 
systems for the evacuation of buildings more capable of guiding the 
occupants of these spaces to a safe place. 


Ontology Domain and Scope are: 
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Domain: Evacuation of buildings under fir mergency 


Focus: Recommendation of evacuation routes in real time based on contextual 
information obtained from loT devices. 


Contribution: Contributes to a solution for the evacuation of buildings 
under fire. 


Usage: In fire building evacuation systems. 


Purpose: Support interoperability between IoT devices, occupants and other 
systems. 


Next, a ontology python class is created, which is an object-oriented, programming language-resembling 
representation of an ontology class. The instance can have a description generated for it using the ‘str’ function. 


Example: 


data_propertyl = DataProperty("Class Name", "Property Namel", 
"Descriptionl") 


ontology node = OntologyClass(name="Class Name", comment="Comment about the 
class.", dataproperties=[data_propertyl, data_property2]) 


The attributes are described as: 


f"The attribute {self.name} (an attribute to {self.name}) asks: 
{self.question}" 


And the class is then described as: 


f"The class {self.name} is (are) {self.comment}\nFor finding information to 
fill in attributes, consider the following:\n{dataproperties str}" 


This representation is used as the “query_node_description”. 
Next, an example of the “output_format’ we demand the agent to follow is given: 
Formulate your response based on the rule: 


1. If you don't know the answer, just respond “Don't know”. DO NOT try to 
make up an answer. 


2. If the question is not related to the context, respond that you are 
tuned to only answer questions that are related to the context. 


3. DO NOT add anything else to the response. 
4. Be clear and precise. 


5. Fill the template with information from the context, fill all the 
requested data types with information. 


Response generic format: 


Name: <name>, Description: <description>, Data properties: [<data 
properties>] 


This information is then accompanied by context queried from the knowledge corpus. Texts are extracted from a 
PDF file and converted into numerical vectors using a specialized PDF processor. This vector representation of 
the text is stored and then queried using the node description for the retrieval of the top-k results - the context 
holding the “high-temperature” information. 


The augmented prompt is constructed using this formula: 


Use the following pieces of context to formulate an ontology class instance 
representation. 


Context: 
{context}, {query node description}, {output_format} 


The augmented prompt is then sent to the GPT-4 API for processing and lowering the information’s temperature 
into the required structure. 
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The above described process will result in the successful curation of information, in a desired structure of one 
class instance, as it is portrayed in the text. The following are key aspects of the process: 


e Data Collection: Utilizing the GPT API to get data from a knowledge corpus to aggregate and 
accommodate data. 


e Knowledge Base Enrichment: Briefly describing the methodology to enrich the knowledge base, 
including the integration of ontologies, vector handling, and prompt engineering. 


e Class Instance Formation: Achieving the structured representation of one class instance from the text, 
adhering to the defined ontology and prompt templates. 


8. CONCLUSIONS 


In conclusion, this paper discusses the transformative potential of Large Language Models (LLMs) in Knowledge 
Management (KM) within the building industry, with a particular focus on fire safety planning. By harnessing the 
computational prowess of LLMs, a framework is delineated capable of mining unstructured documents from 
previous projects, processing this data into actionable knowledge, and preserving it for future endeavors. 


The challenges for future development of such a framework are multifaceted, and include: 


e Accuracy Benchmarking: Creating a mechanism to assess the correctness of the curated information is 
crucial for ensuring the agent's trustworthiness. This involves evaluating the relevance and consistency 
of the curated low-temperature information, and employing a back-propagation process to analyze its 
alignment with the specific domain, ontology, and templates. 


e Step Size in Respect of Token Attention Span: The token attention span of the model must be meticulously 
managed to ensure that the curated information is accurate and properly compiled when augmented into 
the prompt template. This encompasses careful handling of tokenization, embedding, prompt 
consciousness, and robust mechanisms for large ontology structures and knowledge corpus. 


e Fractured Information Retrieval: Addressing the challenge of incomplete knowledge sources is vital to 
collect comprehensive information without resorting to premature termination by the model. 


e Unwanted Event Amalgamation: Further refinement in the ingestion of PDF files is needed to prevent the 
creation of class instances that may not be accurately represented in the text. 


The outlined challenges, such as accuracy benchmarking and token attention span, underscore the complexities 
inherent in this approach. Nevertheless, surmounting these hurdles may pave the way for transitioning from labor- 
intensive curation methods to an automated, intelligent system. This paradigm shift has the potential to change 
how knowledge is managed in an industry characterized by fragmentation and variability, offering a more 
streamlined and effective method for capitalizing on past insights to fuel future innovation. 
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ABSTRACT: Decarbonizing the construction sector has become an imperative global agenda, with electric 
machinery playing a pivotal role in realizing this objective. This research concentrates on devising an operational 
scheduling optimization method for electric ready-mixed concrete vehicles (ERVs) — a groundbreaking, eco- 
friendly intervention for the construction sector. We commence by outlining a systematic problem definition for the 
ERY operational process, considering the distinctive characteristics of electric vehicles and ready-mixed concrete 
(RMC) delivery tasks. The entire process is then conceptualized as a Markov decision problem (MDP), which 
enables sequential decision-making. We subsequently develop an enhanced model-based reinforcement learning 
technique, named parallel-masked-decaying Monte Carlo Tree Search (PMD-MCTS), for efficient resolution of 
the MDP. The entire system is authenticated via a real-world case study, and the PMD-MCTS's performance is 
juxtaposed against existing benchmarks. The results demonstrate the appropriateness of the proposed MDP 
formulation for tackling RMC delivery tasks. The PMD-MCTS algorithm and one of its ablation algorithms (PM- 
MCTS) have demonstrated superior performance compared to other benchmarks in either cost reduction or delay 
minimization, with PMD-MCTS requiring 30% less computation time than PM-MCTS. 


KEYWORDS: Electric vehicle, Ready-mixed concrete delivery; Scheduling optimization; Model-based 
reinforcement learning; Monte Carlo Tree Search 


1. INTRODUCTION 


The escalating issue of carbon emissions, largely attributing to global warming, has necessitated decarbonization 
as a global imperative for sustainable development (Sinha & Chaturvedi, 2019). As a result, decarbonization has 
emerged as a global priority for sustainable development (Bogachkova, Guryanova, & Usacheva, 2022). 
Construction industry activities are a significant source of environmental pollution, responsible for approximately 
one-third of carbon emissions (Gan, Chan, Tse, Lo, & Cheng, 2017). Specifically, the ready-mixed concrete (RMC) 
production accounts for a large portion of global emissions (Olanrewaju, Edwards, & Chileshe, 2020). In addition, 
RMC is a still-growing market due to the rise of green building construction and the urbanization in developing 
countries (Hart, Nilsson, & Raphael, 1968). Palaniappan, Bashford, Li, Fafitis, and Stecker (2009) indicated that 
the transportation of RMC represents a major component of energy use and emissions. Therefore, optimizing the 
scheduling of RMC delivery is crucial for a greener construction industry. Due to the significant development of 
battery technology and automation, the electric drive technology has been regarded as a promising solution for 
improving the sustainability of the construction industry (T. Lin et al., 2020). Truck manufacturers have recently 
developed several electric RMC trucks that aim to implement emission-free transport in the construction industry 
(Volvo Trucks delivers the first heavy-duty electric concrete mixer truck to CEMEX, 2023). However, the academic 
focus on construction electric vehicles (CEVs), a cross-domain technology integrating the unique properties of 
electric vehicles (EVs) and the construction industry, has been limited. This research aims to bridge this gap by 
focusing on the scheduling optimization of electric ready-mixed concrete vehicles (ERV) to further the cause of 
greener construction. 


In the EV domain, several battery-related factors have been extensively studied, such as battery status, charging 
rates and prices, and charging station locations (Turan, Pedarsani, & Alizadeh, 2020). However, most EV-related 
studies are not directly applicable to the construction industry due to its unique characteristics and requirements. 
Furthermore, existing CEV-related studies mainly focus on hardware improvements such as drivetrain (Tong, Jiang, 
Tong, Zhang, & Wu, 2023) and transmission system (Tan, Yang, Zhao, Hai, & Zhang, 2018), with little attention 
paid to the management-related topics. Studies related to RMC production and delivery have primarily focused on 
developing optimization formulations and optimization algorithms. For instance, P.-C. Lin, Wang, Huang, and 
Wang (2010) formulated the RMC delivery as a job shop problem, where each RMC delivery represents a job 
operation carried out by one of the trucks that correspond to the workstations. Z. Liu, Zhang, Yu, and Zhou (2017) 
proposed a time-space network that combines RMC production and vehicle dispatching, and the problem is 
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optimized by a heuristic algorithm. Nonetheless, these studies overlook the specific properties of ERV, especially 
those related to the battery, such as charging and energy consumption. 


To bridge this gap, our study proposes a scheduling optimization methodology for ERV dispatching. We first 
provide a detailed problem definition for ERV dispatching, incorporating the unique features of both EVs and 
RMC delivery tasks. The problem is then modeled as a Markov decision process (MDP) to capture the sequential 
logic of the RMC dispatching task. Finally, we propose an improved model-based reinforcement learning 
algorithm to solve the MDP problem. This algorithm, developed using a state-of-the-art Monte Carlo Tree Search 
(MCTS), is enhanced with a state-dependent action masking and a decaying searching strategy. 


2. METHODOLOGY 
2.1 ERV Operation Problem Definition 


The operation of Electric Ready-mixed Vehicles (ERVs) comprises five components: a) The construction site and 
b) the ready-mixed concrete (RMC) plant, which are the locations where ERVs are prepared and RMC is poured. 
c) RMC mixer and d) charging station, which are machinery installed at the RMC plant for loading RMC and 
charging EVs respectively. Lastly, e) ERVs are the vehicles for dispatching RMC. For the purposes of this study, 
we assume pumps are pre-installed. The operational process of ERVs can be partitioned into three sections: the in- 
plant process (IP), the midway process (MP), and the on-site process (OP). 


2.1.1 In-plant process 


Prior to the delivery of the RMC batch to the construction site, it is imperative that an ERV is adequately prepared 
with the required RMC and sufficient battery power at the RMC plant. The specificities of two in-plant processes 
are as follows: (1) IP1-RMC Production and Loading: The RMC mixer produces and loads RMC onto ERVs 
according to the demands, which are typically determined based on the specific requirements of various 
construction sites. The following assumptions are made: a) RMC mixers can produce any type of required RMC, 
and the loading rate is set constant in this study for efficient validation. b) The RMC plant can load RMC onto 
multiple ERVs simultaneously, eliminating any queuing time for the loading (Z. Liu et al., 2017). c) Each ERV 
should be fully loaded unless it delivers the last batch of the target construction site, which can be smaller than an 
ERV’s capacity. d) The plant owns various types of ERVs, and their loading capacities and operation costs are 
different (Z. Liu et al., 2017). (2) IP2-Charging of ERVs: When an EV’s battery is less than its required degree, 
it will be recharged by charging stations. All the charging stations are installed only in the RMC plant, which is a 
regular practice in current ERV providers. a) We assume that the charging rates and costs are constant, but a basic 
cost is set to avoid frequent charging since launching the charging station is power-consuming. b) Multiple ERVs 
can be recharged simultaneously using multiple charging stations. c) ERVs have different battery capacities, and 
they can be recharged to a certain level between the existing status and the fully charged status. 


2.1.2 Midway process 


Following proper preparation, all ERVs should depart from the RMC plant to their corresponding construction 
sites. Two types of midway processes can be considered: (1) MP1-Plant to Site: a) A qualified pump is assumed 
to be installed on the construction site before the first arrival of ERV. b) To avoid unnecessary battery drainage 
while waiting with loaded RMC, it is assumed that the ERV will depart only if its arrival time is not earlier than 
the demand time of the target site. If the arrival time is earlier than the demand time, the ERV’s RMC loading time, 
battery charging time, and travel time will be delayed. c) This study assumes that the ERV has the same traveling 
speed and battery consumption rate under the loading status. (2) MP2-Site to Plant: After unloading a batch of 
RMC at the construction site, the ERV returns to the plant and prepares for the next delivery batch of RMC. a) It 
is assumed all RMC have been unloaded, and the ERV is in an empty status. b) All ERVs return with a fixed 
traveling speed and energy consumption rate in the empty-load status. 


2.1.3 On-site process 


ERVs are assumed to arrive at the construction site not earlier than the demand time, which allows for the pouring 
task to commence promptly upon the ERV’s arrival. a) Pourings are preferred to be consecutive, but delivery delays 
are also allowed in real applications. The construction sites claim a maximum for the delivery interval. b) Although 
static during the pouring process, ERVs remain operational, and a fixed battery consumption rate is assumed. c) 
This study supposes each construction requires only one ERV for pouring, and the pouring rate is fixed. 
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2.2 Modeling the ERV Operation Processes via an MDP Model 


Based on the ERV operation problem definition, the maximum RMC batch of each construction site can be inferred 
by considering the minimum ERV capacity and site demands. This allows the ERV operation process to be modeled 
as a sequential decision-making problem, with the goal of optimizing the dispatch sequence of all the ERVs. During 
each dispatch, the ERV delivers a batch of RMC to a certain construction site with a certain battery level. Markov 
Decision Process (MDP) is a potent model-based method for sequential decision-making, which can be solved 
byiteratively evaluating the reward function for all potential states and actions until convergence to the optimal 
value (Zhang et al., 2020). Therefore, an MDP formulation is proposed for sequential coverage pattern analysis, 
represented by a four-element tuple (S, A, Ts,a, Ra). S is the state space, A is the action space, Ts,a is the state 
transition operator, and Ra is the reward function. These elements are discussed below. 


2.2.1 State 


S is the state space, s € S is the current state, which is a tuple of 2N; + 2N2components. N; denotes the maximum 
number of ERVs owned by the RMC plant, while N2 represents the maximum number of construction sites in 
demand. The MDP state comprises four parts, namely ERV’s latest available time (LAT), ERV’s battery status, the 
construction sites’ latest demand time (LDT), and the quantities of undelivered RMC. The details of the MDP 
states are illustrated in Table 1. 


Table 1 The definition of the MDP state. 


State number Meaning Related component Format 
[1, N1] The latest available time (LAT) of the ERV Day-hour-minutes 
Ist to the N1th ERVs. 
[N1+1, 2N1] The ERVs’ battery states on their ERV kWh 
corresponding LAT. 
[2N1+1, 2N1+N2] The latest demand time (LDT) of the Construction site Day-hour-minutes 
Ist to the N3th construction sites. 
[2N1+N2+1, 2N1+2N2] The quantities of undelivered RCM Construction site m3 


for a certain construction site. 


For clarification, we consider a scenario with two ERVs, two construction sites, and the state is (9:00, 9:30, 120, 
60, 10:30, 12:00, 50, 25) (as shown in Fig. 1). It means that the first ERV is available after 9:00 with a 120-kWh 
battery, and the second ERV is available after 9:30 with a 60-kWh battery. The first construction site has a latest 
demand time (LDT) of 10:30 and requires 50 m? of RMC. The second construction site has an LDT of 12:00 and 
requires 25 m? of RMC. 


state (9:00, 9:30,}120, 60,]10:30, 12:00,}50, 25) 
States of the ERVs States of the Construction Sites 


Fig. 1 An illustration of the MDP state 


2.2.2 Action 


A is the action space, where a € A is the action taken based on the current state. Each action is denoted by (e, c, 
b), where e is the serial number of the ERV, c is the serial number of each construction site, and b is the departure 
battery level. For the battery level, 0 means that the battery is kept unchanged, and its accuracy can be determined 
based on the user’s computational capacity. Following the same scenario in section 3.3.1, we assume a battery 
accuracy is 5. Then, action (1, 2, 4) means the first ERV deliveries a batch of RMC to the second construction site 
with an 80-percentage battery, and action (2, 1, 0) means the second ERV is dispatched to the first construction 
site without charging. 


2.2.3 Transition 


Ts,a is the absolute transition that action a in state s at step ¢ will lead to state s’ at step ¢+/, as the transition is fully 
under control in this study. Apart from the state information, additional parameters are clarified for the state 
transition (Table 2). 
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Table 2 Fixed parameters for the state transition. 


Fixed Definition Related element Unit 
Parameters 
The RMC loading rate of the mixer. RMC mixer m?/min 
p The power of the charging station. Charging station kw 
gie] The battery capacity of an electric vehicle ERV kWh 
x The battery accuracy ERV / 
mf[e] The RMC capacity of the e” ERV. ERV m° 
vi The vehicle speed of loading status. ERV km/hr 
Vz The vehicle speed of empty status. ERV km/hr 
Ty The battery consumption rate for the ERVs to travel under ERV %/km 
loading status. 
T2 The battery consumption rate for the ERVs to travel under ERV %/km 
empty status. 
T3 The battery consumption rate for ERVs to conduct the ERV %/m3 
pouring task. 
dic] The distance from the plant to the c” construction site. Construction site km 
w The RMC pouring rate. ERV m?/min 


Firstly, the quantity of the RMC required by the target construction site (s[2N, + Nz + c]) is compared with the 
capacity of the ERV (m[e]) to get the quantity of delivered RMC RMCogetiverieq (Eq.(1)). Subsequently, the 
loading time tipading can be obtained by Eq.(2). Given the action component b and the battery status (s [N, + e]), 
the charging time tcnarging can be calculated using Eq.(3). The departure time from the factory to the construction 
taeparture can be determined by Eq.(4). Based on Eq.(5), the start time of the current dispatch tstart can be 
obtained by comparing the ERV’s arrival time (s[e] + tcharging + tioaaing + tdeparture) With the construction 
site’s LDT (s[2N, + c]). Further, Eqs.(6) and (7) are used to calculate the pouring time tpouring and return time 


t ; 
— RMCgetiveriea = Min(s[2N, + Nz + c],m[e]) (1) 
tioading = RMCgetiveriea/l (2) 
tenarging = È* gle] — SIM, + e1)/p k 
tdeparture = d[c]/vı (4) 
tstart = min(s[e] + tioading + tcnarging + taeparturerS[2N; + cl) (5) 
tpouring = RMCgetiveriea/W (6) 
treturn = 4[c]/v2 (7) 


The LAT of the current ERV s[e] is updated by adding the pouring time and return time to the start time (Eq.(8)). 
The ERV’s battery level s[N, +e] is updated according to Eq.(9), where the second term is the battery 
consumption during traveling, and the third term is the battery consumption during the pouring task. The target 
construction site’s LDT s[2N, + c] is updated by adding the pouring time to the start time (Eq.(10)). Further, the 
required quantity of RMC s[2N, + N, +c] can be updated by subtracting the quantity of delivered RMC from 
the target construction site’s initial requirement, as shown in Eq.(11). 


s[e] = tstart + tyour t+ treturn (8) 
b (9) 
s[N, +e] = (- — d[c] * (rı + r2) — RMCgetiveriea * r) * gle] 
s[2N, +c] = tstart + tpour (10) 
S[2N; + Nz + c] = s[2N, + N3 + c] — RMCgetiveriea (1) 


2.2.4 Reward 


Ra is the immediate reward of action a. In the RMC dispatch task, the objective of the RMC plant is to minimize 
the operational costs and adhere to the dispatch rules, such as avoiding exceeding the maximum pouring interval. 
Meanwhile, the construction sites aim to minimize the total delay for the pouring task. Therefore, the total 
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operation costs 7 and the dispatch delay r4 are selected as the two primary reward components. re can be calculated 
by Eq.(12). cı ($/min) is the unit cost of the ERV operation, cz ($) is the fixed cost of opening the charging station, 
which aims to avoid frequent charging, and c3 ($/kWh) is the unit price for ERV charging. The relevant costs of 
RMC production are not considered as the quantity of the RMC demand is fixed. According to Eq (13), ra can be 
calculated by comparing ERV’s arrival time with the construction site’s LDT. 


foe = Cy [e] $ (roading + tcharging + tdeparture + tyouring + treturn) — C2 (12) 


b 
* Boolean(tcharging > 0)— C3 * G * gle] — s[N, + e]) 


Tq = —max (s[e] + tcharging + tioading + taeparture — s[2N, +c], 0) (13) 


In addition, a significant negative reward r, is generated if an invalid action is taken. Three types of invalid actions 
have been identified: 1) Actions that head to the construction site without any demand for RMC, as shown in 
Eq.(14); 2) Actions with a battery level below the current battery level or the minimum battery requirement 
(Eq.(15)). 3) Actions that result in a dispatch delay that exceeds the maximum interval ô (Eq.(16)). When the 
RMC demands of all the construction sites are fulfilled, a great positive reward ry is generated. Finally, the total 
reward can be calculated by Eq.(17), where œ and œ, are importance hyper-parameters. Apart from ry all other 
reward components are negative. 


InvalidAction 1: s[2N, + N, +c] >0 (14) 

InvalidAction 2: b (15) 
> max (s[N, + e], d[c] * (b,[e] + ba [e]) — RMCgetiveriea * b3[e]) 

InvalidAction 3: s[e] + tenarging + tioading + taeparture — S[2N; + c] < 6 (16) 

r= Te +a * T, +æ *Tůa t (17) 


2.3 Optimization Using PMD-MCTS 


Many reinforcement learning methods have been developed to solve the MDP problem, including model-free 
algorithms and model-based algorithms (Sutton & Barto, 2018). Specifically, the model refers to the state transition 
function 7;,, and the reward function R4 of the MDP problem. Compared with model-free methods, model-based 
reinforcement learning has the great potential to make RL algorithms more sample efficient (Wang et al., 2019). 
MCTS is a model-based RL algorithm that plans the best action at each time step (Browne et al., 2012). It is an 
effective heuristic search algorithm for solving episodic decision-making problems when the underlying search 
spaces are computationally expensive (B. Huang, Boularias, & Yu, 2022). However, MCTS relies on a large 
number of interactions with the environment emulator to construct the search trees for decision-making (Browne 
et al., 2012). To mitigate the high time complexity of classical MCTS, this section develops an improved MCTS 
algorithm, named parallel-masked-decaying MCTS (PMD-MCTS). Specifically, the state-of-the-art parallel 
MCTS algorithm, WU-UCT (A. Liu et al., 2018), is adopted as the fundamental model. It is further improved by 
incorporating a state-dependent action masking operation and a decaying search strategy. The details of the 
algorithm are introduced as follows. 


2.3.1 Fundamentals of WU-UCT-based parallel Monte Carlo Tree Search 


MCTS adopts a tree-search method that incrementally extends a search tree from the current environment state 
(Luo et al., 2022). Each node denotes a visited state, and each edge from this state denotes an action that can be 
taken at that state, leading to a landing node that denotes the state after the transition. Typically, MCTS performs 
four sequential steps repeatedly: selection, expansion, simulation, and backpropagation (Fig. 2 (a)). The selection 
step starts from the root node (current state) and recursively selects an existing child node according to a tree policy. 
The process ends when it reaches a leaf node or other termination conditions. One of the most commonly used 
node-selection policies is the Upper Confidence bound for Trees (UCT), and the UCT value a, can be calculated 
by Eq.(18). Here, C(s) represents the child node set of the current node s; V, is the average value estimation for a 
certain child node s’, denoting the exploitation; The second term is the uncertainty of the value estimation, 
denoting the exploration. N, and N,, denote the number of times that nodes s and s’ have been visited, while £ 
is the factor that controls the trade-off between exploitation and exploration. During the expansion state, a new 
child node is added to the selected node, and the value of the expanded node is estimated by performing a model- 
based simulation until termination. The simulation process follows a certain policy (e.g., random). Finally, 
backpropagation recursively updates the statistics {V,, N,} from the expanded node to the root node along the 
selected path. According to Eq.(19), the visit count of each node is increased by 1. The latest value estimation of 
each node can be obtained by Eq.(20), where y is the reward discount factor, and a; is the action that turns the 
state s, to the state s;+;. It should be noted that the leaf node Sp obtains its value from the simulation step. Finally, 


743 


CONVR 2023. PROCEEDINGS OF THE 23™ INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


each node’s average value estimation is updated according to Ep. (21). 


2 log N, 
a, = argmax 4 Vp, + P |——— (18) 
SIEC(S) Ns, 
Ns, = Ns, +1 (19) 
Vo, = R(sz, a+) + Worst (20) 
Ven = (Ns, — 1) Ve, + Vo) /Ns: (21) 


Parallelizing MCTS over multiple workers is an efficient method to improve the optimization speed. During the 
parallel computation, workers typically operate at different steps as the simulation and expansion processes are 
slow (Fig. 2 (b)). As a result, the update of statistics {V,, Ns} may become outdated for workers, and the statistics 
loss becomes inevitable. However, the latest N, is available as soon as a worker initiates the computation since 
we only need to know if the node is selected. Therefore, the WU-UCT algorithm partially addresses the information 
loss by introducing another quantity O into the classical UCT (Eq.(22)), which counts the number of computations 
that have been initiated but not completed (light dashed blue lines in Fig. 2 (b)). The updated UCT effectively 
balances exploration-exploitation tradeoff by considering incomplete samples, and the node values can be updated 


according to Eqs.(23) and (24). 
2 log(N, + O,) 
a, = argmax į V, +6 | 22 
i pees B Ns, + Os, Á ) 


Incomplete update: O, = O, +1 (23) 
Complete update: Os = Os— 1 
Ns = N;+1 mm 
(a) Classical MCTS 
Selection Expansion Simulation Backpropagation 
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Fig. 2 The relationship between classical MCTS and PMD-MCTS 
2.3.2 Action masking 


The ERV operation involves complicated rules, and the valid action spaces usually vary under different states. 
Typically, RL algorithms sample the action from a space containing actions of all states and assign a significant 
negative reward for invalid actions. However, this kind of invalid action penalty is challenging to explore, 
particularly when the state is complicated, even for the very first reward (S. Huang & Ontañón, 2020). Hence, this 
section proposes a state-dependent action masking method to improve MCTS’s efficiency (Fig. 2 (b)). First, a 
complete action space Ao is generated, which contains all combinations of the ERV’s serial number, each 
construction site’s serial number, and the departure battery level. Then, invalid actions (Eqs.(14)-(16)) of Ao are 
updated under each state (circles in Fig. 2 (b)), followed by invalid action masking. Specifically, V, (Eq.(22)) of 
invalid actions are set as a large negative number M (e.g., M = -1 x 10°) during the expansion stage. Consequently, 
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only valid actions will be expanded. During the simulation stage, actions are randomly selected from valid 
candidates, while invalid ones are ignored. In practical implementations, vectorization is adopted for speeding up 
the masking operations. 


2.3.3 Decaying search strategy 


During the implementation of MCTS, an initial MDP state is set as the start for optimization. The best child node 
of the current state is set as the output when the number of iterations is larger than a threshold value of Ns. Then, 
the updated state becomes the new start, and the process iteratively continues until the MDP ends. Intuitively, the 
size of the Monte Carlo tree will decrease gradually, as the quantity of undelivered RMC decreases. Hence, the 
last stages of MDP may not require many iterations, and this section proposes a decaying threshold to ensure both 
optimization accuracy and efficiency. The decaying strategy is designed based on the remaining demand for RMC, 
as shown in Eq.(25). Nso is the maximum number of iterations determined by the users, Qip is the remaining 
required quantity of RMC, Qep is the total RMC demand, and e is the Euler's number. 


Qdemand 


e Qtotal (25) 
Ns = Noo *—— 


3. VALIDATION 
3.1 Scenario Setup 


As the use of electric RMC vehicles is a relatively new solution in the construction industry, a customized dataset 
for this purpose is currently unavailable. Therefore, it is reasonable and acceptable to utilize data from previous 
RMC delivery studies to establish the proposed MDP model. Hence, we extracted the basic configurations of sites 
and RMC vehicles from the dataset of (Z. Liu et al., 2017). The dataset was determined based on a real case, 
including distances between the sites, RMC demands of the construction sites, RMC loading rate, ERVs’ capacities, 
and relevant costs. We updated certain assumptions from (Z. Liu et al., 2017) in more detail. For example, we 
provided vehicle speeds for traveling time calculation and determined the battery-related factors based on actual 
reports (e.g., the charging rates). Table 3 describes the shared parameters, Table 4 indicates the information on the 
construction sites, and Table 5 shows ERV information. Two objectives are optimized: a) objective | aims to 
minimize the operation costs for the RMC plant, and b) objective 2 aims to minimize the dispatch delay for the 
construction sites. 


Table 3 Information of the shared parameters. 


Shared parameters Value Unit 
RMC loading rate 2 min/m? 
Battery charging power 20 kw 
Battery accuracy 5 / 
Vehicle speed of loading status 40 km/hr 
Vehicle speed of empty status 80 km/hr 
Battery consumption rate for ERVs to 1 %/km 
travel under loading status 
Battery consumption rate for ERVs to 0.8 %/km 
travel under empty status 
Battery consumption rate for ERVs to 0.25 %/ m? 
conduct the pouring task 
RMC unloading rate 0.5 m/min 
Cost of opening the charging station 5 $ 
Unit cost for ERV charging 2 $/kWh 
Importance hyper-parameters a, 1 for objective 1, 0 for objective 2 / 
Importance hyper-parameters az 0 for objective 1, 1 for objective 2 / 
Reward for an invalid action rp -1000 / 
Reward for completing the task rf 1000 / 
Maximum pouring interval 6 90 min 
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Table 4 Information of the construction sites. 


No. RMC demand (m°) Distance (km) Start time (hr:mm) 
C1 6 6.2 8:30 
C2 60 4.0 8:40 
C3 26 5.5 9:30 
C4 3 3.4 10:40 
C5 64 12.0 11:20 
C6 64 4.1 10:00 
C7 24 6.6 15:10 


Table 5 Information of the ERVs. 


No. 1 2 3 4 5 6 7 8 
RMC Capacity (m°) 8 8 8 7 7 6 5 2 
Unit cost ($/min) 1.3 1.3 1.3 1.2 1.2 1.1 1.0 0.8 
Battery capacity (kWh) 160 160 160 160 160 120 100 50 
Initial battery capacity 80 80 80 80 80 60 50 25 
(kWh) 
Initial LAT (hr:mm 6:00 6:00 6:00 6:00 6:00 6:00 6:00 6:00 


3.2 Benchmark Setup 


To validate the performance of our proposed PMD-MCTS algorithm under the given scenario setup, we compared 
it with three benchmarks, including GA-based optimization from (Z. Liu, Zhang, & Li, 2014), and two ablation 
studies. All algorithms were run ten times to minimize the impact of random errors. The two most common metrics, 
namely a) the average reward and b) the average computation speed, were used as the first two evaluation criteria. 
To test the stability of the algorithm, three additional metrics were selected, namely c) the success rate, d) the 
standard deviation (SD) of the average reward, and e) the SD of the average computation speed. Instead of 
terminating the MDP process when an invalid action occurs, we adopted a great negative number as a penalty and 
continued the MDP simulation. To avoid a negative battery state, the negative battery level was modified to the 
smaller one between the current battery status and the minimal battery requirement. 


3.2.1 Genetic algorithm 


Three-layer chromosome: The chromosome structure was designed based on the concepts of (Karakatič, 2021; 
Z. Liu et al., 2014). As described in (Z. Liu et al., 2014), the maximum number of vehicles to be dispatched is 
fixed, which is set as the chromosome length. The chromosome of (Z. Liu et al., 2014) has three layers: a) sequence 
of construction site ID, b) sequence of the accumulative number of vehicles to the construction site, and c) 
sequence of vehicle ID. The second layer was removed as it can be inferred from the first layer. In addition, we 
added a layer for battery level according to (Karakatié, 2021), and used the same battery definition as our PMD- 
MCTS method. An illustration of the chromosome is shown in Fig. 3. 


The 2™ ERV is dispatched to the 
1“ construction site with 80% battery. 


STITT] 
ARNAN 
STIS [4101312] 


Fig. 3 An illustration of the GA chromosome 


Selection: The chromosome represents a set of sequential MDP actions that can be input into the MDP model to 
obtain the accumulative reward (fitness). It should be noted that action will not be taken if the target construction 
site has been satisfied. 


Crossover: This study adopted one-point crossover, but the crossover operation may change the maximum number 
of vehicles required by each site. Hence, the probability mapping method of (Z. Liu et al., 2014) was adopted for 
the first layer crossover, as shown in Fig. 4. Specifically, each gene in the first layer has a mapping probability, 
and the crossover is conducted on the probability layer. The new chromosome is generated by mapping the 
probabilities to a basic chromosome in descending order, and the basic chromosome can be user-defined 
({1,1,1,2,2,3,3] in Fig. 4). The crossovers of the second and third layers are conducted directly. We conduct the 
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crossover layer by layer, which can generate six children during one crossover. 


Fig. 4 The crossover of two chromosomes 


Mutation: One-point mutation is adopted, as shown in Fig. 5. Similar to the crossover operation, the mutation of 
the first layer is realized by probability mapping, while the genes in the other two layers are mutated according to 
their ranges. 
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Fig. 5 The mutation of the chromosome 


Table 6 Hyperparameters of genetic algorithm 


Hyperparameters Value 
Population size 20 
Parent number 3 
Mutation rate 0.3 
Maximum generation number 200000 


3.2.2 Ablation studies 


We have made two improvements based on WU-UCT-based MCTS. To evaluate the effectiveness of these 
improvements, we conducted two ablation studies: a) PD-MCTS, which is PMD-MCTS without action masking, 
and b) PM-MCTS, which is PMD-MCTS without decaying search strategy. The hyperparameters used in the 
MCTS algorithms are listed in Table 7. 


Table 7 Hyperparameters of MCTS algorithms 


Hyperparameters Value 
Number of expansion workers 8 
Number of simulation workers 16 

Maximum search step (Nso) 3000 
Maximum search depth 100 
Maximum search width 200 

Discount factor 0.9 
Expansion policy Random 


3.3 Results 


Experiments for PMD-MCTS and four benchmark algorithms were conducted on the designed scenario. The entire 
procedure was executed on a laptop with the specification of Intel 19-10980H 3.10GHz CPU and 32GB RAM. The 
the scheduling performance of each algorithm is shown in Table 9 and Table 8. 
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Table 8 Scheuling performance of each algorithm in objective_1 (minimizing the costs) 


Average reward SD of average Success Average SD of average 
reward rate (%) computation time computation 
(s) time 
GA -8313.1(-6494.3*) 1283.1 0 102.0 1.7 
PD-MCTS -6950.0 (-5421.5*) 1040.6 0 320.8 26.3 
PM-MCTS -3095.2 (-3004.5*) 120.4 100 385.6 40.1 
PMD-MCTS -3064.0 (-2862.4*) 123.0 100 260.0 19.9 
Table 9 Scheuling performance of each algorithm in objective 2 (minimizing the delay) 
Average reward SD of average Success Average SD of average 
reward rate (%) computation time computation 
(s) time 
GA -837.9 (479.4*) 1076.5 30 93.27 2.0 
PD-MCTS -484.1 (182.8*) 503.3 100 497.8 78.1 
PM-MCTS 969.0 (1000*) 18.2 100 292.4 10.4 
PMD-MCTS 991.6 (1000*) 6.7 100 203.0 1.6 


* indicates the optimal performance 


The empirical results indicate that our PMD-MCTS algorithm demonstrates superior performance, achieving the 
highest rewards across both objectives. Specifically, the average reward of the PMD-MCTS in objective 1 is 3064.0, 
which translates to an average cost of $4064.0. In objective 2, the average reward of the PMD-MCTS is 991.6, 
representing an average delay of 8.4 minutes, and the most optimal solution can eliminate any delay entirely. 
Furthermore, our findings suggest that only algorithms implementing action masking (i.e., PMD-MCTS and PM- 
MCTS) can consistently ensure a feasible solution for both objectives. These two algorithms also display the 
smallest standard deviation of average reward, indicating their superior stability. Although PM-MCTS exhibits a 
performance similar to PMD-MCTS in terms of reward and success rate, the PMD-MCTS requires 30% less 
computational time than PM-MCTS. 


4. DISCUSSIONS 


This study's outcomes substantiate the effectiveness of our proposed scheduling optimization approach for 
managing ERV operations. This methodology mainly contributes to the field in three ways. 


Firstly, this study addresses an existing gap in on-road Commercial Electric Vehicle (CEV) research. We are 
pioneers in examining CEVs, particularly on-road CEVs, marking a significant stride towards sustainable 
advancement in the construction sector. By incorporating the demands of Ready-Mixed Concrete (RMC) 
dispatching and Electric Vehicles (EVs), we have holistically examined the characteristics of ERVs. This problem 
definition can potentially be extrapolated to other CEV studies in the future. Secondly, we have crafted a novel 
formulation for the RMC delivery problem, utilizing the Markov Decision Process (MDP) based on the temporal 
dynamics of the RMC delivery process. Compared to its predecessors, the MDP formulation is a more rational 
choice as it facilitates sequence decision-making. This approach prevents invalid decisions at each stage and 
ensures the decision-making process is far-sighted, considering all decisions in a comprehensive manner. Lastly, 
we introduced an enhanced Monte Carlo Tree Search (MCTS) algorithm, named PMD-MCTS, to optimize the 
ERV operation process. When compared with four benchmark algorithms, it proved to be the most effective. Two 
key advantages of the PMD-MCTS were identified: Both PMD-MCTS and PM-MCTS displayed superior 
performance in terms of average reward and success rate, outperforming the Genetic Algorithm (GA) by 
employing the MCTS optimization strategy. The GA algorithm fails to ensure a feasible solution for both objectives, 
owing to its limitations in managing sequential requirements. PMD-MCTS surpassed PM-MCTS on computational 
speed. Our PMD-MCTS saves over 30% of the computational time required by PM-MCTS, without compromising 
on accuracy, by implementing a decaying strategy. 
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5. CONCLUSION 


In the face of pressing concerns over carbon emissions, the construction industry can expect to see an influx of 
more sustainable technologies. Electric Ready-mixed Vehicles (ERVs) are a promising technology geared towards 
enhancing the sustainability of the construction industry. However, the interdisciplinary nature of ERVs has led to 
a considerable gap in this field. This study addressed this gap by proposing a scheduling optimization methodology 
for ERV dispatching. It introduces a systematic problem definition for the ERV operation, which integratively 
considers the properties of both EVs and RMC delivery tasks. Moreover, the ERV operation process is modelled 
as an MDP problem, thereby breaking down the entire process into sequential sub-processes. The proposed PMD- 
MCTS algorithm, equipped with parallel computing, invalid action masking, and decaying searching capability, 
has been validated through a meticulously designed experiment. This study, thus, provides a comprehensive 
evaluation of ERV operations and offers a solid foundation for future research in this domain. 
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ABSTRACT: Recently there has been a strong interest in using semantic technologies to improve information 
management in the construction domain. Ontologies provide a formalized domain knowledge representation that 
provides a structured information model to facilitate information management issues such as formalization and 
integration of construction workflow information and data and enables further applications such as information 
retrieval and reasoning. SPAROL Protocol And RDF Query Language (SPAROL) queries are the main approaches 
to conduct the information retrieval from the Resource Description Framework (RDF) format data. However, there 
is a barrier for end users to develop the SPAROL queries, as it requires proficient skills to code them. This 
challenge hinders the practical application of ontology-based approaches on construction sites. As a generative 
language model, ChatGPT has already illustrated its capability to process and generate human-like text, including 
the capability to generate the SPARQL for domain-specific tasks. However, there are no specific tests evaluating 
and assessing the SPARQL-generating capability of ChatGPT within the construction domain. Therefore, this 
paper focuses on exploring the usage of ChatGPT with a case of importing the Digital Construction Ontologies 
(DiCon) and generating SPARQL queries for specific construction workflow information retrieval. We evaluate 
the generated queries with metrics including syntactical correctness, plausible query structure, and coverage of 


correct answers. 


KEYWORDS: Semantic web, Ontology, ChatGPT, SPAROL, RDF, Information retrieval, Construction 


1. INTRODUCTION 


Construction is an information-intensive and dynamic industry, which requires effective information management 
and exchange. Especially, construction professionals need always to retrieve demanding information about the 
construction process to be aware of the prompt situation to support their decision-making and action-taking (Akinci, 
2015). With the ongoing advancements of digital implementation in the construction domain, a large amount of 
semantic data can be collected from heterogeneous systems, which requires a systematic solution to formalize, 
integrate, and manage the data (Kosovac et al., 2000). Therefore, researchers in the construction domain have 
investigated the application of semantic web technologies to facilitate information formalization and 
interoperability issues (Zhou et al., 2016). For example, numerous ontologies have been developed in the 
construction domain, to provide a formalized construction domain knowledge representation and comprehensive 
semantic vocabulary. These ontologies support the conversion of construction information into Resource 
Description Framework (RDF) format and the establishment of an integrated semantic graph database. 


Such integrated semantic graph database has been proven to the advantageous in the application of information 
integration, reasoning, and retrieval by academic scholars in the construction domain (Akinyemi et al., 2018). 
However, in terms of the practical implementation, one drawback of the graph database can be identified. To 
retrieve the demanding information from the graph database, the SPARQL Protocol and RDF Query Language 
(SPARQL) queries are needed. However, the construction sector has limited practitioners with sufficient 
knowledge of ontology and proficient skills to code the SPARQL queries at the moment. This makes it difficult 
for end-users to interact directly with the graph database and retrieve the particular construction information they 
need. While there have been various attempts to employ web-based lists or templates to assist users in crafting 
SPARQL queries without coding skills, generating SPARQL queries quickly and easily remains a challenge for 
end-users. Given this challenge, it would be beneficial to develop an intuitive and self-explanatory method that 
allows end-users to create SPARQL queries more easily. 


ChatGPT is an Al-empowered language generation model, which uses the Transformer algorithm and large 
language model (LLM) principle as the basis to generate human-like text based on the given prompts and context 
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with rapid response time (van Dis et al., 2023). After its public release at the end of 2022, ChatGPT received huge 
attention from academia, industries, and consumers. ChatGPT also can analyze and generate structured syntax- 
based contents such as codes, scripts, and ontology syntax (Lin et al., 2023). ChatGPT can also generate the 
SPARQL query sentences based on the predefined ontology inputs since its database involves numerous examples 
of SPARQL and ontologies in the OWL representation (Tan et al., 2023). Based on this feature, using ChatGPT 
could be considered to be an alternative approach for information retrieval for the RDF data, which would be easy 
and simple to use. ChatGPT can also be used to directly retrieve information from a provided ontology, including 
instance data, without using SPARQL queries. However, due to the private and sensitive nature of company and 
construction-related instance data, it is imperative not to share this kind of data with a self-learning LLM model. 


However, the accuracy and real capability of ChatGPT to generate practical SPARQL queries for achieving specific 
information retrieval in the construction domain have not been tested. To assess whether if it is a feasible solution 
to aid construction information retrieval tasks, in this paper, we aim to test and evaluate the current capability of 
ChatGPT to generate the SPARQL queries for accomplishing domain-specific construction information retrieval 
tasks. The Digital Construction Ontologies (DiCon) (Zheng et al., 2021) previously developed by our research 
group is used as a case study for the test. We first feed the DiCon ontologies to the ChatGPT for the initial 
ontological parse as the fundamental context of generating the SPARQL queries. We experimented on the 
generated SPARQL with four different scenarios and used the metrics of syntactical correctness, plausible structure, 
and the coverage of the correct result to assess the generated SPARQL queries. 


The paper is structured as follows. Section 2 provides a review of related works of ontologies and ChatGPT. 
Section 3 introduced the research methodology and the architecture of the DiCon-ChatGPT system. In Section 4 
the tests and results are illustrated. This is followed by the discussion, limitation, and future research in Section 5. 
Finally, in section 6 the conclusion of the paper is given. 


2. BACKGROUND 
2.1 Semantic Web and construction information 


Data and information are the key resources of the construction industry to guarantee smooth collaboration for the 
operations. However, data and information are also influenced by the segmented nature of the construction industry 
and the diverse software solutions in use. The data is generated in isolated systems by the different stakeholders in 
the different disciplines and is usually formed into various file formats (Kosovac et al., 2000). Such interoperability 
issue is notorious in the construction domain. The semantic web and associated Linked Data concept is considered 
as an approach to facilitate the information interoperability problem, which provides technical standards for a 
comprehensive heterogeneous information integration and machine-readable representation of the data for further 
implementation (Beetz et al., 2021). 


Ontologies serve as the foundation of the semantic web, which provides a shared and machine-readable 
conceptualization of domain knowledge and could be further used as a data structure to formalize and integrate 
data and information. Recently, ontologies have become increasingly relevant in the construction domain, to 
address challenges like data integration and knowledge management (Pauwels et al., 2017). Various iconic domain 
ontologies have been created in the construction field, including generic ones like e-COGNOS (El-Diraby et al., 
2011) and IC-PRO-Onto (El-Gohary et al., 2010). Our research team also developed the DiCon, which defines 
construction workflow-related entities with the Semantic Web Ontology Language (OWL) representation and 
achieves integrating data from diverse systems. DiCon also aligned with other ontologies such as IFCOWL 
(Pauwels, 2016), the Building Topology Ontology (BOT) by Rasmussen et al. (2020), and SOSA/SSN ontology 
(Janowicz et al., 2019) to link the construction information with building information modeling, topological, and 
sensor data. 


The foundational language of the semantic web is the Resource Description Framework (RDF). The data structure 
of RDF is organized into a triple format of a subject, predicate, and object, or subject, property, and value (Manola 
et al., 2004). The RDF triples constitute an RDF graph and store it in RDF graph stores for further utilization such 
as information retrieval (Hitzler et al., 2008; Allemang et al., 2020). SPARQL Protocol and RDF Query Language 
(SPARQL) is a semantic graph query language specifically designed to query RDF data (W3C, 2013). SPARQL 
allows users to query RDF data by specifying patterns to match against the triples in the RDF graph. Utilizing an 
RDF-based ontology together with SPARQL can significantly enhance the efficiency of information extraction. 
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2.2 ChatGPT and related works 


ChatGPT is a state-of-art large language model (LLM) developed by OpenAI. The G to refers to generative, which 
means the ability to generate human-like responses and demonstrate a level of language understanding that has 
been groundbreaking in the field of natural language processing. P refers to pre-trained because the model is pre- 
trained on a massive dataset containing a diverse range of text from the internet. The T refers to Transformer, the 
architecture of a deep-learning model designed for natural language processing tasks (van Dis et al., 2023). It 
learns to predict the next word in a sentence, which enables it to understand grammar, context, and semantics in 
the text. Currently, ChatGPT has four versions, including GPT-1, GPT2, GPT3.5, and the latest GPT4. 


As an AI language model, ChatGPT's responses are based on patterns learned from a diverse range of data during 
training, which includes general information on ontologies, SPARQL, and RDF. Therefore, ChatGPT also can 
collaborate with Semantic Web technologies. Several scholars have explored the combination of ChatGPT with 
semantic web technologies in different directions. Lin et al. (2023) involved context-based ontology modeling 
with ChatGPT to represent database semantics in natural language representation for supporting database 
management in data integration. Tan et al. (2023) assessed the capability of ChatGPT to conduct knowledge-based 
question-answering with generated SPARQL queries. The experimental results showed that ChatGPT is a 
promising tool for question-answering under continuous updating and iterating of the model. Meyer et al. (2023) 
conducted a set of experiments using ChatGPT with the knowledge of graph engineering. The result showed 
ChatGPT has a remarkable capability to support knowledge graph engineering in constructing knowledge graphs, 
translating natural language queries into precise and organized SPARQL queries, tailored to the provided 
knowledge graphs, and diagrams illustrating expansive schemas of knowledge graphs. 


In summary, the existing tests indicate that ChatGPT can generate SPARQL queries for information retrieval tasks. 
Therefore, ChatGPT could be a potential tool for the automated generation of SPARQL queries for construction 
information retrieval and management. However, the previous related works focus on assessing the capability of 
ChatGPT in general Linked Data and knowledge graph domains. The capability of ChatGPT to generate domain- 
specific SPARQL queries for construction information retrieval has not been explicitly tested or validated. 
Therefore, in this paper, we aim to test and evaluate the current capability of ChatGPT to generate SPARQL queries 
for retrieval-specific construction workflow information. 


3. METHODOLOGY 
3.1 Research design 


To achieve the identified research objective, the research is designed as shown in Fig.1. We set up an experimental 
case study of feeding the DiCon ontologies to ChatGPT to test its current capability of generating the SPARQL 
for retrieval construction information. The results of the experiment are the generated SPARQL queries. To 
evaluate these queries, we followed Meyer et al. (2023) who selected syntactical correctness, plausible structure, 
and the coverage of the correct result as three metrics to assess the generated SPARQL queries. First, syntactical 
correctness aims to check whether the query is following the correct syntax of SPARQL. A query is considered 
syntactically correct if it can be executed by an SPARQL engine without encountering errors. Second, the plausible 
structure is used to assess if the query has missing prefixes, wrong use of the classes and properties, or ChatGPT 
creates the classes and properties out of the given ontologies. The plausible structure is evaluated by manual 
assessment. Third, the coverage of the correct results is investigated by comparing the query result of the generated 
SPARQL with the ground truth data. 


To evaluate the capability of the ChaptGPT in different scenarios four tests are designed. In the first test, we asked 
the ChatGPT to create SPARQL queries for direct construction information retrieval. The second test was to ask 
ChatGPT to generate the SPARQL queries based on different natural language expressions of prompts. In terms 
of the third test, we asked ChatGPT to generate CONSTRUCT-type SPARQL queries based on updating 
information based on the given ontology. Finally, we guide ChatGPT to refine the SPARQL queries which are not 
performing well, to assess the performance of the refinement. Finally, the results of the four tests are assessed 
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based on the previously defined metrics. 
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Fig. 1: Research design 
3.2 The technical architecture of the experiment 


To conduct the above pre-defined tests, we set up a technical architecture of the experiment following the ChatGPT 
prompt engineering guideline (White et al., 2023) shown in Fig.2. First, due to the large size and high complexity 
of the original DiCon ontologies, in tests we created a subset of the DiCon ontology contains all essential classes 
and properties that mapped with the example data graph. Such a process could keep the essence of the ontology 
but reduce the cost of ChatGPT tokens. ChatGPT 4.0 is the latest model that can directly read textual files (OpenAI, 
2023). Thus, we select this version and feed the ontology subset file in Turtle format to ChatGPT 4.0 as the prime 
prompt. Based on the context feature of the ChatGPT, further SPARQL generation experiments can use the prompt- 
input ontology subset as the basis of generating the SPARQL queries. Then, the predefined four tests are conducted 
with the different prompt inputs. The generated SPARQL queries are also used against the instance data stored in 
GraphDB, a commercial graph store platform by Ontotext (Ontotext, 2023), to retrieve the target information as 
an evaluation of the syntactical correctness and coverage of the correct result. 
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Fig. 2: Technical architecture of the experiment 


4. TEST FOR DICON-CHATGPT TO GENERATE SPARQL QUERIES 


In this section, a detailed description and result of the aforementioned tests are demonstrated. We also obtained the 
practical data from previous projects as the ground-truth data to test the SPARQL queries answering. 


4.1 Test 1: SPARQL query generation for direct information retrieval 


In our first test, we wanted to determine the general capability of ChatGPT to generate SPARQL queries based on 
the given DiCon ontology subset, to accomplish direct construction information retrieval task. Thus, we asked 
ChatGPT with Prompts 1.1 and 1.2 to retrieve essential construction information. Both Prompts only utilize 
terminologies from the DiCon (Zheng et al., 2021). 
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Prompt 1.1: Based on the given ontology, create a SPAROL query to find the activity and its related agent and 
location. 


Prompt 1.2: Based on the given ontology, create a SPARQL query to find the activity information about its start 
time and end time. 


Prompt 1.1 aims to retrieve information about construction activity, including its assigned location and agent. This 
prompt is simple in that in the DiCon class a dicp:Activity has direct properties of dicp:hasLocation and 
dica:hasAgent towards dice:Location and dica:Agent. Prompt 1.2 aims to extract the information of the start and 
end time of the activity. This prompt has an indirect relationship between activities and their start or end times. 
Because in DiCon, we define the property dicp:occupiesTimelInterval to represent the temporal information of an 
activity. The range of the dicp:occupiesTimelnterval is a dice:Timelnterval, which has beginning and end to 
dice: TimelInstant to indicate as the start and end time. 


Each of the prompts was asked five times, and the generated results are partially shown in the Appendix due to the 
length of the paper. In terms of Prompt 1.1, for the five times of the generation, there are four times the ChatGPT 
uses the correct terminologies in the provided ontology. One time, it was not sure if dicp: location was the correct 
property to use, thus it defined a location property but with the wrong prefix dice: in the query. For Prompt 1.2, 
ChatGPT generates only one own property to describe the start and end time of an activity. To check the coverage 
and syntactical correctness of the query, we queried the generated SPARQL to GraphDB with the example data 
graph. For both of the prompts, all the queries can be successfully executed by the SPARQL engines of GraphDB, 
which confirms all the queries are syntactically correct. In terms of coverage, all the queries with the correct 
structure can query the correct answer from the example graph. The metrics results for Prompts 1.1 and 1.2 are 
listed in Table 1. 


Table 1: Metric results of generated SPARQL queries in Test 1. 


Prompt Metric Result 
1.1 Syntactical correctness 5/5 
Coverage of the correct result 4/5 
Plausible query structure 4/5 
1.2 Syntactical correctness 5/5 
Coverage of the correct result 4/5 
Plausible query structure 4/5 


4.2 Test2: SPARQL query generation from different natural language expressions 


The second test intends to evaluate the capability of the ChatGPT to generate SPARQL queries based on different 
natural languages but with the same information retrieval target as Prompt 1.1. This test also analogs the practical 
nature and scenario that the different end users may have different natural language expressions for the prompts. 


Prompt 2.1: Based on the given ontology, create a SPAROL query that lists the agent and location of an activity 
Prompt 2.2: Based on the given ontology, create a SPARQL query that lists the worker, workplace of an activity 


In terms of Prompt 2.1, we provided a different expression of Prompt 1.1 but still used the terminologies from the 
DiCon. In terms of Prompt 2.2, we use similar terminologies that have been mixed usually in the construction 
domain from describing the classes, to check if the ChatGPT can generate SPARQL with the terminologies out of 
the ontology. Each of the prompts was asked for five times. For Prompt 2.1, four times it generated the same 
queries as Prompt 1.1, and one time it used the wrong prefix of the dicp: location property. All the queries can be 
executed in GraphDB, the query with the erroneous prefix returned no location data, while the other four returned 
correct answers. For Prompt 2.2, only one generated SPARQL was correct to link the term ‘worker’ to the class 
agent and the term ‘workplace’ to the location. ChatGPT created the classes of worker and workplace one time, 
and the other three times it managed to link the term ‘worker’ to the class dica: Agent but failed to link the term 
‘workplace’ to the class dice:Location. All of the generated queries are syntactically correct, but only one query 
provided us with the correct answer. The metrics results are shown in Table 2. 


Table 2: Metric results of generated SPARQL queries in Test 2. 


Prompt Metric Result 
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2.1 Syntactical correctness 5/5 


Coverage of the correct result 4/5 
Plausible query structure 4/5 
2.2 Syntactical correctness 5/5 
Coverage of the correct result 1/5 
Plausible query structure 1/5 


4.3 Test 3: Infer construction information with SPARQL 


SPARQL is not just limited to querying data by using the SELECT statement. It also provides other functionalities 
such as INSERT, UPDATE, and CONSTRUCT statements to update or create new graph contents (W3C, 2013). 
Therefore, this test aims to evaluate the capability of ChatGPT to generate a SPARQL query using the 
CONSTRUCT statement to infer additional construction information by creating a new RDF graph. We provide 
the Prompt 3 to ChatGPT: 


Prompt 3: Based on the given ontology, create a SPAROL query to construct a new graph, in which if the activity 
has an agent, construct new triples that the agent "is an agent in" the activity. 


This prompt is based on the predefined ontology that dica:hasAgent property has the inversed property of 
dica:isAgentIn, which was not included in the example data graph. This test aims to assess whether ChatGPT can 
generate CONSTRUCT SPARQL queries that utilize the inversed properties in the ontology to construct new 
graphs. Similar to the previous tests, Prompt 3 was also asked five times, and we analyzed the syntactic correctness 
and coverage of the generated queries. The evaluation results for the prompt are listed in Table 3. The metrics 
results show all the generated queries with plausible query structures, are syntactically correct, and cover the 
correct result. 


Table 3: Metric results of generated SPARQL queries in Test 3. 


Prompt Metric Result 
3 Syntactical correctness 5/5 
Coverage of the correct result 5/5 
Plausible query structure 5/3 


4.4 Test 4: Refinement of SPARQL query 


In terms of Prompt 2.2, which did not perform well in the test 2. Therefore, in this test, we try to use the contextual 
feature of the ChatGPT to refine the result by providing more explicit prompts based on the ontology structure. 
We provide the prompt P4 to the ChatGPT: 


Prompt 4: Refine the previous SPARQL query based on the given ontology, that a “workplace” should link to the 
class dice:Location. 


Similar to the previous tests, the refinement was also conducted five times, and we analyzed the plausible query 
structure, syntactic correctness, and coverage of the generated queries. After the refinement, the structure of the 
query has been significantly improved. All of the queries use the correct terminologies from the provided ontology 
without misused prefixes, are conductible, and can generate correct answers. The analyzed results for the Prompt 
4 are listed in Table 4. 


Table 4: Metric results of generated SPARQL queries in Test 4. 


Prompt Metric Result 
4 Syntactical correctness 5/5 
Coverage of the correct result 5/5 
Plausible query structure 5/5 


756 


SECTION C - Al, DATA SCIENCE AND ANALYTICS 


5. DISCUSSION 


Corresponding to the above four tests and their results, we assess the current capability of the ChatGPT with three 
predefined metrics. The overall results of different prompts with three metrics as shown in Fig.3. In the following, 
we will provide a detailed summary and discussion of the experiment results. First, in terms of syntactical 
correctness, it is obvious from the experimental results that all the ChatGPT-generated queries in the experiment 
avoid syntax errors. Such absence of syntactical errors across all generated queries proves ChatGPT comprehends 
and applies the syntactic rules inherent to SPARQL. This accomplishment is not merely an incidental outcome, 
but a demonstrable indication of the underlying capabilities that ChatGPT harnesses in generating high-quality, 
error-free SPARQL queries. 


Syntactical correctness Coverage of the correct result Plausible query structure 
y rag Y 


5 
4 
o 
P1.1 P12 P2.1 P22 P3 Pá 


Prompts 


Resut 


Fig. 3: Overall metrics result of the generated SPARQL for different prompts 


Second, when examining the plausible query structure, the detailed analysis revealed that some of the generated 
results contain structure mistakes. For example, both Prompts 1.1 and 1.2 have one generated query that did not 
use the correct property in the given ontology but defined a new property as an assumption to complete the query. 
Although the used terminology is compliant with DiCon, the prefix used was incorrect. For Prompt 2.2, when 
faced with terminologies outside the ontology, performance had reduced significantly. In this case, it generates 
only one correct query using the classes and properties defined in the given ontology. The term 'worker' was 
successfully mapped to the class dica:Agent, but the system failed to map 'workplace' to the class dice:Location 
and its associated properties. In test 4, by providing a more explicit mapping instruction prompt to ChatGPT as a 
refinement, the performance improved. This result reveals that the current ChatGPT still requires explicit prompts 
using the terminologies defined in the ontology to ensure the generation of plausible queries. 


Third, the test results indicate a significant and noteworthy correlation between the structural accuracy of the 
queries and the subsequent performance of the AI-generated queries in producing accurate query results. In essence, 
this observation highlights the pivotal role that query structure plays in determining the efficacy of Al-generated 
queries. Meanwhile, if the generated result is unsatisfactory, ChatGPT can refine the generation by providing new 
prompts with the correct information to fix the errors. 


In summary, the current version of ChatGPT has demonstrated impressive capabilities. It successfully translated 
natural language questions into syntactically correct SPARQL queries for the DiCon ontology. A detailed analysis 
revealed some mistakes in the generated results, which can be refined with extra explicit prompts, to elaborate 
more precise instructions. 


6. LIMITATION 


This research is just the first research of our research teams to explore the usage of Natural Language Processing 
(NLP) tools to combine and support semantic construction informatics. Admittedly, this research has the following 
limitations. First, this research is limited by testing scale. The performance of ChatGPT with entire DiCon 
ontologies has not been tested and limited numbers of tests have been made. In the future, the test volume and 
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velocity should be enlarged to improve the accuracy of the research. In the future, the full DiCon ontology suite 
will be also tested with ChatGPT and more comprehensive prompts. Additionally, as DiCon is aligned with other 
ontologies, such as Building Topology Ontology (BOT), SOSA/SSN, IFCOWL, etc., these ontologies will also be 
included in the upcoming more complex tests. Second, this research is only tested using ChatGPT. Currently, 
ChatGPT is the most used NLP tool and is easy to deploy and test. Although ChatGPT is under continuous updating, 
as a commercial solution, its closed source and black-box nature makes it difficult to optimize and train. Therefore, 
besides the ChatGPT, in future research, other LLM models and NLP tools should be also tested and explored, for 
example, the Falcon LLM (Technology Innovation Institute, 2023). More construction domain-specified training 
based on the open-sourced LLM will also be conducted. Third, the prompts themselves would affect the generated 
query results. Throughout this research, the prompts provided to ChatGPT have been intentionally kept simple and 
straightforward to gauge ChatGPT's performance in generating SPARQL queries. However, recognizing the 
multifaceted nature of semantic querying and the potential intricacies within construction information, future 
studies will necessitate a broader spectrum of test cases with more specific construction information retrieval tasks. 
By incorporating more intricate prompts, further research will better evaluate ChatGPT's capacity to handle 
complex query generation tasks in the construction domain. Another research track is also studying the prompt 
development manner based on the readability score. 


7. CONCLUSION 


This paper tested the capability of ChatGPT to generate SPARQL queries with a case study based on the DiCon 
ontologies for construction workflow information retrieval. We designed and conducted a set of tests to assess the 
current capability of ChatGPT to generate the SPARQL query for solving the construction domain-specific task of 
information retrieval. 


Overall, as observed from the result, ChatGPT has demonstrated its impressive capability of generating SPARQL 
queries for the given ontology to retrieve target information on the construction domain level. The experiments 
show that ChatGPT can avoid syntax errors, and read and utilize the given ontologies as the basis of SPARQL 
generation. However, the execution of the test process and evolution of the generation result also reveal several 
current limitations of ChatGPT in generating SPARQL queries. For example, ChatGPT cannot fully understand 
the given ontology, and the quality of the generated SPARQL queries highly relies on the explicitness of the given 
prompts. 


In future research, we would also continue the exploration of using LLM solutions, including ChatGPT and other 
tools, to combine with the Semantic Web implementation in the construction domain. One future research topic is 
to illustrate the construction information, from SPARQL queries generated by Chat-GPT on Extended Reality 
devices. Simultaneously, we would also test the feasibility of ChatGPT to create RDF converters, which will 
convert tabular data into RDF format mapped with the DiCon ontology. 
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APPENDIX: Prompts and Example ChatGPT SPARQL results 


Prompts Example result 
PREFIX dicp: <https://w3id.org/digitalconstruction/0.5/Processes#> 

Prompt 1.1: Based on the given ontology, create a SPAROL query PREFIX dica: <https://w3id.org/digitalconstruction/0.5/A gents#> 
PREFIX dice: <https://w3id.org/digitalconstruction/0.5/Entities#> 
SELECT ?activity ?agent ?location 
WHERE { 

activity rdf:type dicp:Activity . 

activity dica:hasAgent ?agent . 

activity dice:hasLocation ?location . 
} 
PREFIX dicp: <https://w3id.org/digitalconstruction/0.5/Processes#> 
Prompt 1.2: Based on the given ontology, create a SPAROL query PREFIX dice: <https://w3id.org/digitalconstruction/0.5/Entities#> 
SELECT ?activity ?startTimelInstant ?endTimelnstant 
WHERE { 


to find the activity and its related agent and location. 


to find the activity information about its start time and end time. 
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activity rdf:type dicp:Activity . 
activity dicp:occupiesTimelInterval ?timeInterval . 
?timeInterval dice:hasStart ?startTimelnstant . 
?timelnterval dice:hasEnd ?endTimelnstant . 
} 
PREFIX dicp: <https://w3id.org/digitalconstruction/0.5/Processes#> 
Prompt 2.1: Based on the given ontology, create a SPAROL query PREFIX dica: <https://w3id.org/digitalconstruction/0.5/A gents#> 
PREFIX dice: <https://w3id.org/digitalconstruction/0.5/Entities#> 
SELECT ?activity ?agent ?location 
WHERE { 
activity rdf:type dicp:Activity . 
activity dica:hasAgent ?agent . 
activity dice:hasLocation ?location . 
} 
PREFIX dicp: <https://w3id.org/digitalconstruction/0.5/Processes#> 
Prompt 2.2: Based on the given ontology, create a SPAROL query PREFIX dica: <https://w3id.org/digitalconstruction/0.5/A gentsi#> 
PREFIX dice: <https://w3id.org/digitalconstruction/0.5/Entities#> 
SELECT ?activity ?worker ?workplace 
WHERE { 
activity rdf:type dicp:Activity . 
?activity dica:hasAgent ?worker . 
activity dice:hasLocation ?workplace . 
?worker rdf:type dica:Worker . # Assuming "Worker" is a subclass 
or type of Agent 


that lists the agent and location of an activity 


that lists the worker, workplace of an activity 


PREFIX dicp: <https://w3id.org/digitalconstruction/0.5/Processes#> 

Prompt 3: Based on the given ontology, create a SPAROL query PREFIX dica: <https://w3id.org/digitalconstruction/0.5/A gents#> 
to construct a new graph, in which if the activity has an agent, CONSTRUCT { wad 
g j , : ?agent dica:isAgentIn ?activity . 


construct new triples that the agent "is an agent in" the activity. } 
WHERE { 
?activity dicp:hasAgent ?agent . 
} 
PREFIX dicp: <https://w3id.org/digitalconstruction/0.5/Processes#> 
Prompt 4: Refine the previous SPAROL query based on the given PREFIX dica: <https://w3id.org/digitalconstruction/0.5/Agents#> 
PREFIX dice: <https://w3id.org/digitalconstruction/0.5/Entities#> 
SELECT ?activity ?agent ?location 
dice: Location. WHERE { 
activity rdf:type dicp:Activity . 
activity dica:hasAgent ?agent . 
activity dice:hasLocation ?location . 


} 


ontology, that a “workplace” should link to the class 
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ABSTRACT: The construction industry has long been recognized for its complex safety regulations, which are 
essential to ensure the well-being of on-site employees. However, navigating these regulations and ensuring 
compliance can be challenging due to the volume and complexity of the documents involved. This study proposes 
a novel approach to extracting information from construction safety documents utilizing Large Language Models 
(LLM), called CSQA, to provide real-time, precise answers to queries related to safety regulations. The approach 
comprises three modules: (1) the construction safety investigation module (CSI) collects safety regulations for 
building the information needed. By leveraging a collection of safety regulation PDFs, the system follows a process 
of text extraction, preprocessing, and global indexing for efficient search. (2) The safety condition identification 
module (SCI) retrieves the CSI database; after that, the LLM, with its extensive training, processes user queries, 
searches the indexed regulations, and retrieves pertinent information. (3) the safety information delivery (SID) 
would provide the answer to the user and incorporate a feedback mechanism to further refine system accuracy 
based on user responses. Preliminary evaluations reveal the system's superior performance over traditional search 
engines, owing to its ability to grasp query context and nuances. The CSQA presents a promising method for 
accessing safety regulations, with potential benefits including reduced non-compliance incidents, enhanced 
worker safety, and streamlined regulatory consultations in construction. 


KEYWORDS: Construction safety document, extraction, LLM. 


1. INTRODUCTION 


Safety has consistently been seen as a vital concern within the construction industry. Workplace safety 
catastrophies can result in major loss of life and damage to property with severe repercussions (S. V.-T. Tran et al., 
2023; S. V. T. Tran et al., 2021). According to the latest statistics from the Occupational Safety and Health 
Administration (OSHA Fatality Report, n.d.), the construction industry witnessed an annual total of 1,008 fatalities 
in 2020. Notably, falls from elevated positions constituted around thirty-three percent of these. According to data 
from Statistics Korea (Construction Work | Statistics Korea, n.d.), the construction business in South Korea 
accounted for more than 50% of all fatal accidents within the industry. To prevent accidents at construction sites, 
several scholars and professionals have demonstrated that implementing enhanced safety measures in the 
workplace might reduce and prevent accidents (Bao et al., 2022; S. V. Tran et al., 2022; S. V. T. Tran et al., 2022). 
Therein, field compliance checking is a crucial endeavor to identify non-compliance with construction safety 
standards, with the primary objective of safeguarding employees against potential safety events (Jeong et al., 2023; 
Kang et al., 2023). 


Analyzing construction safety documents with natural language processing (NLP) techniques enables automatic 
information extraction of safety requirements. For instance, Feng and Chen (Feng & Chen, 2021) proposed a 
framework based on deep learning to extract event-related information (e.g., date, location, and type of accident) 
from accident news reports for construction safety management. Rupasinghe and Panuwatwanich (Rupasinghe & 
Panuwatwanich, 2021) proposed a rule-based technique for extracting information about hazards from accident 
reports. Baker et al. (Baker et al., 2020) suggested employing NLP (a collection of text patterns) to uncover injury 
precursors. These works together focused on either the study of injury and accident records or the extraction of 
hazard variables. Despite these studies, there is a dearth of research aimed at automatically extracting requirements 
from construction safety rules in order to enable field compliance. Besides, the information extraction should 
provide users with precise and timely responses to their inquiries within human natural language. 


Large Language Models (LLM) have emerged as a game-changing technology, displaying extraordinary ability in 
natural language processing jobs. Incorporating LLMs into construction safety provides a distinct benefit in its 
capacity to customize to particular, project-centric data. This is especially important given the vast volumes of 
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private paperwork that projects often require. Every project in the construction environment is unique, with its 
own blueprints, safety regulations, and vendor-specific rules, often encased inside PDFs and other digital forms. 
LLMs have the capacity to be trained or fine-tuned on project-specific datasets. Once a company uploads its 
confidential documents, the LLM can absorb this data, guaranteeing that when queries are posed, the solutions are 
general and suited to the context of that specific project's data. 


This research proposes CSQA approach, a unique method for extracting construction safety documentation using 
Large Language Models (LLM), to answer real-time safety regulatory questions and fill the knowledge gap for 
industry experts. The method has three parts: (1) The construction safety investigation module (CSI) gathers 
building safety rules. The system uses safety regulation PDFs for text extraction, preprocessing, and global 
indexing for efficient search. (2) The safety condition identification module (SCI) obtains the CSI database, then 
the LLM analyzes user queries, examines the indexed rules, and retrieves relevant information with its thorough 
training. (3) Safety information delivery (SID) would address the user and offer feedback to improve system 
accuracy depending on user replies. Section 2 discusses the current state of construction safety information 
retrieval and LLM. Section 3 will present the recommended approach. The authors produce case scenarios in 
Section 4 to validate the approach. Subsequently, the discussion and conclusions of the study are presented. 


2. LITERATURE REVIEW 
2.1 Current state of construction safety information retrieval and extraction 


Over the years, the construction industry, renowned for its complex projects and the resulting safety imperatives, 
has accrued a vast repository of safety regulations, guidelines, and best practices. Traditionally, retrieving and 
extracting relevant safety information was primarily a manual process (Zhong et al., 2020). Professionals 
frequently find themselves navigating through extensive physical binders or digital documents. This approach, 
while exhaustive, is fraught with difficulties. Due to the time-consuming nature of manual searches and the 
possibility of human error, there are frequent voids in the incorporation of vital safety directives(S. V. T. Tran et 
al., 2021). Moreover, the dynamic nature of construction projects, with their distinct challenges and parameters, 
necessitates a customized understanding of safety regulations, which manual searches cannot provide 
efficiently(Wu et al., 2022). 


Efforts have been made since the advent of the digital age to expedite this procedure (S. V. T. Tran et al., 2021). 
Initially, safety information was migrated to digital databases, enabling keyword-based searches. Even though this 
change facilitated the retrieval process to some degree, it was not without limitations. Keyword searches frequently 
return many results, necessitating additional sorting to locate relevant information. The lack of contextual 
comprehension and the static nature of these databases provided a wealth of information without the nuanced 
interpretation required for specific project scenarios. For instance, Feng and Chen (Feng & Chen, 2021) proposed 
a framework based on deep learning to extract event-related information (e.g., date, location, and type of accident) 
from accident news reports for construction safety management. Rupasinghe and Panuwatwanich (Rupasinghe & 
Panuwatwanich, 2021) proposed a rule-based technique for extracting information about hazards from accident 
reports. Baker et al. (Baker et al., 2020) suggested employing NLP (a collection of text patterns) to uncover injury 
precursors. This context paves the way for investigating more sophisticated Al-driven methodologies capable of 
efficient information retrieval and contextual comprehension and understanding. 


2.2 Information extraction using Large Language Model 


Natural language processing allows a computer to interpret and process natural language text similarly to a person. 
Information extraction (IE) is a branch of natural language processing that obtains needed information from text 
sources. In general, there are two techniques for information extraction [11]: (1) machine learning (ML) and (2) 
rule-based approaches. However, research has focused on using rule-based techniques because training samples 
are few. Large Language Models (LLM) have transformed natural language processing, providing a game- 
changing answer to this problem. 


Within the realm of construction, safety stands as a paramount pillar, with documentation and guidelines serving 
as the backbone to ensure the welfare of all stakeholders. Extracting relevant, actionable information has been a 
persistent challenge with the sheer volume and complexity of safety documentation. The potency of LLMs in 
safety information extraction lies in their ability to discern the context of a query and retrieve information that is 
not just relevant but also actionable. For instance, when asked about safety protocols for handling specific 
machinery, an LLM can sift through a vast repository of safety guidelines, pinpointing the exact procedures, 
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precautions, and best practices. 


Besides, construction safety may benefit from Large Language Models (LLMs) since they can be tailored to 
project-specific data, particularly given the vast volumes of private paperwork projects frequently include. Every 
construction project has plans, safety regulations, and vendor-specific rules, frequently in PDFs. LLMs may be 
trained or fine-tuned using project-specific datasets. The LLM may integrate confidential materials uploaded by a 
company to provide project-specific solutions to inquiries. Project secrecy and relevance are greatly affected by 
LLMs' project-specific customization. Traditional search engines and databases may provide general results or 
need substantial human labeling to identify project-specific data. LLMs automatically comprehend the context 
after being fine-tuned on a project's papers, ensuring that every answer meets the project's particular characteristics 
and criteria. This improves information retrieval accuracy and relevance and keeps sensitive project data in that 
context, protecting private project information. 


3. METHOD 


The primary purpose of developing an approach of extracting construction safety requirements using large 
language model. The structure and key features of the system are shown in Figure 1, which comprises three 
modules. (1) The construction safety investigation module (CSI) gathers building safety rules. The system uses 
safety regulation PDFs for text extraction, preprocessing, and global indexing for efficient search. (2) The safety 
condition identification module (SCI) obtains the CSI database, then the LLM processes user queries, examines 
the indexed rules, and retrieves relevant information with its thorough training. (3) Safety information delivery 
(SID) would address the user and offer feedback to improve system accuracy depending on user replies. 


Construction Safety investigation Safety condition Safety Information 
(CSI) identification (SCI) Delivery (SID) 


Comatructen De 


23 : Praza ea 
| coment | 
| 


Information request 
Large 
Language Context understanding | 


Fig. 1: Proposed approach of Extracting Construction Safety Requirements using Large Language Model 


The Construction Safety Investigation (CSI) module is the foundational block in this approach, concentrating on 
the meticulous collection of safety regulations essential for creating the requisite information database. This 
module predominantly handles a variety of PDF safety regulation documents and serves as the entry point for raw 
safety data. The CSI module has multiple functions, including text extraction, preprocessing, and global indexing. 
Text extraction is crucial, as it converts the information in PDFs into a structured format. Preprocessing then entails 
cleaning and normalizing the extracted text to prepare it for the subsequent phases. 


The Safety Condition Identification (SCI) module serves as the interface between the foundational database created 
by the CSI module and the user-facing delivery module in this approach. The primary responsibility of the SCI 
module is to interact with the CSI database and retrieve pertinent safety information based on user queries. The 
Large Language Model (LLM) incorporated into this module plays a crucial role, utilizing its extensive training 
to process and comprehend user queries in real time. The LLM examines the indexed regulations in the CSI 
database and retrieves relevant information, considering the context and subtleties of the user's query. 
Incorporating LLM into this module ensures that the retrieval process is accurate, context-aware, and efficient, 
providing instantaneous responses to user queries. 


This approach also includes the Safety Information Delivery (SID) module, which focuses on delivering the 
retrieved and processed safety information to the end-user. It serves as the user interface, providing plain, concise, 
and pertinent responses to user queries. Beyond merely delivering information, the SID module includes a 
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feedback mechanism that allows users to rate the accuracy and relevance of the provided answers. This user 
feedback is crucial for refining the system's precision and improving dependability. By perpetually incorporating 
user feedback, the SID module ensures that the system evolves and adapts to the users' changing requirements and 
preferences, maintaining its relevance and effectiveness in delivering precise construction safety information. 


3.1 Prototype development 


Figure 2 depicts prototype development process and tool uses for the proposed approach. The authors used 
Langchain, an open-source Python library for building LLM-powered applications. Utilizing LLMs with vector 
indexing via embedding provides a foundation for the solution. Initially, a comprehensive safety regulation 
database is accessed and processed to collate information predominantly housed in PDF formats. Subsequently, 
this information is extracted, followed by a data cleaning procedure to omit redundant elements, such as 
punctuation, commas, and line spaces. For this operation, a smaller LLM from the Spacy library is deployed. The 
information is segmented into manageable chunks to facilitate efficient filtering, aligned with the embedding 
model's chunk size within the embedding space. After the initial processes, the refined information is fed into a 
text embedding model to formulate and archive the information in a vector database, commonly referred to as a 
vector store, pivotal for advanced information retrieval mechanisms. The essence of the embedding model is to 
transmute high-dimensional textual data into a more condensed representation, aligning with the operational 
frameworks of LLMs, as we can't put our whole PDF textual information in user query or Prompt. For this critical 
transformation, the OpenAI text embedding model is employed. The formulated vector database harboring safety 
regulation information is then integrated into the pipeline, allowing LLMs to perform advanced retrieval of 
information pertinent to user queries. 
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Fig. 2: System architecture 


The creation of the vector store is facilitated using FAISS for efficient similarity search and clustering analysis of 
high-dimensional vector databases. Upon establishing the vector database, it is intricately interwoven within the 
operational pipeline, enabling LLMs to execute sophisticated information retrieval and interaction, utilizing 
OpenAl's GPT-4 through Langchain, a versatile open-source framework for building AI apps and Chatbots. The 
streamlined process integrates FAISS, user queries, and LLMs responses in a seamless flow. When a user initiates 
a question, it is directed to FAISS's sophisticated similarity search algorithm, which extracts relevant information 
from the vector database used by LLMs in embedding form. 


4. CASE STUDY 


The authors performed a case study of safety information extraction related to scaffolding during construction by 
implementing the CSQA approach, as illustrated in Figure 3. The extraction of safety regulations, specifically 
OSHA 1926 Subpart L, is pivotal in maintaining a high level of safety in construction environments where 
scaffolding is utilized. To do this, the authors download A Guide to Scaffold Use in the Construction Industry as a 
PDF file and then upload it to the CSQA prototype system. After that, the safety managers query the information 
related to their needs. By meticulously extracting and implementing each safety provision laid out by OSHA, 
construction companies can significantly mitigate the risk of scaffold-related incidents, protecting workers from 
falls, structural collapses, and falling objects. This process of extracting and adhering to OSHA’s stringent safety 
regulations is essential in fostering a culture of safety within the construction industry, emphasizing the importance 
of the well-being of every individual on the construction site and ensuring the successful and safe completion of 
construction projects. 
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SECTION C - Al, DATA SCIENCE AND ANALYTICS 


Fig. 3: The construction jobsite using both fixed and mobile scaffolding 


In the case study, both fixed and mobile scaffolding were used at the construction jobsite (as illustrated in Fig. 3). 
To prepare for the safety inspection process, the safety manager considers some potential hazards situation that 
may occur during using scaffolding system. The results of the prompting process were illustrated in Fig. 4. In the 
scenario, the safety manager would request information about the maximum number workers allowed to use the 
scaffolding simultaneously. After prompting, the results of extractions were described in Fig. 4. 
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5. DISCUSSION AND CONCLUSION 


The study aimed to improve construction safety by proposing an approach to extracting safety requirements using 
a large language model. A thorough literature review highlighted the significance of safety information retrieval 
and extraction. Accordingly, the safety requirements were collected and tailored for building the database, which 
is contained in the safety investigation module (CSI). The system uses safety regulation PDFs for text extraction, 
preprocessing, and global indexing for efficient search. The safety condition identification module (SCI) obtains 
the CSI database, then the LLM processes user queries, examines the indexed rules, and retrieves relevant 
information with its thorough training. Safety information delivery (SID) would address the user and offer 
feedback to improve system accuracy depending on user replies. Hence, the safety requirements could be extracted 
following the request of site employees. The authors developed the prototype of an LLM-powered application by 
using Langchain to validate the approach. The results show that the maximum number of workers allowed to use 
the scaffolding simultaneously was retrieved from the guide to scaffold use in the construction industry. 


However, the research has the following limitations: (1) The study concentrates on optimizing a database of safety 
requirement information; however, it does not discuss the algorithm's architecture and precision. (2) The case study 
is only used to extract scaffolding-related information. For future studies, the authors will analyze additional 
accident reports and regulations to develop potential hazard situations associated with a specific activity. 
Additionally, the authors will concentrate on developing a system based on the proposed method. Then, we 
examine the effectiveness of the system with larger initiatives and more project members. 
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ABSTRACT: Classical and industrial archaeologies are a complex cultural field where singularity and 
uniqueness are expressed through past memory evidence and identity recognition. In order to obtain these values 
acknowledgement, it is compelling to highlight the material and intangible knowledge by using suitable ICT tools 
capable of handling complexity and managing large sets of heterogeneous data usually subjected to changes, 
different interpretations, inconsistencies and sometimes uncertainty. Although the HBIM method has been largely 
used in the past years, it shows significant limits when dealing with large and heterogeneous information requiring 
the introduction of advanced methods and tools. In this context, this study presents an approach to the architectural 
heritage and historical manufacturing activity representation based on integrating the HBIM process with a 
structured knowledge base, demonstrated through its application to the Sanctuary of Hercules and the former 
Segrè Papermill case study. The work develops an ontology-based system using existing ontologies for the three 
domains of interest: architectural artefact, cultural heritage and industrial processes directly connected with the 
informative model. The intent is to give an overall support system for the complex semantics formalization of these 
assets to aid the interpretation, intervention and valorization activities. 


KEYWORDS: Archaeology, Industrial heritage, Knowledge representation, HBIM, Ontologies, Linked data 


1. INTRODUCTION 


The widespread of various digital approaches in the built heritage field has raised multiple issues related to the 
effectiveness of those different methodologies. In this context, it is compelling to comprehend the complex nature 
of the these specific sites and to define reliable methods to document research and investigation processes for the 
recovery, intervention and valorization activities. Particular assets distinguished by multiple archaeological 
stratifications are emblematic cases that perfectly outline the critical aspects of current practices. The knowledge 
representation and management activities play a crucial role alongside the historical and archival research, the 
integrated survey and the information modelling of these artefacts, to address the challenges of complex heritage 
assets related to their uniqueness and singularity/rarity and to solve critical issues of interoperability, alignment of 
cataloguing systems, in order to make knowledge shareable among the various actors involved in decision-making 
processes. 


Within such peculiar applications, this study tries to apply effective digital knowledge technologies to address the 
main research question, or rather the knowledge representation in a complex evolutionary scenarios, such as 
multiple archaeological sites, by integrating information modelling (BIM) and semantic web technologies 
(ontologies). In particular, with a specific focus on modelling the Segré papermill historical industrial process, 
linked to the architectural artefact and its historical evolution. The aim is to highlight comprehensively the 
industrial archaeology aspects which is defined as an interdisciplinary study field related to both the historical 
issues of the industry’s world and the material culture (processes, machines, workers life). In fact, the industrial 
archaeology represents a combined form of technological and humanistic culture, whose value extends and 
matures overtime. Therefore, the experiment focuses on covering the gap in digital data documentation of 
industrial archaeology and historical manufacturing processes to represent multidisciplinary concepts correlations 
in a machine-readable way. Furthermore, the study proposes an approach to the knowledge representation and 
global management for the investigation process of this unique heritage field. 
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1.1 Current digital documentation and investigation processes for built heritage 


Over the past decade, we have observed a growing focus on Building Information Modeling in the area of heritage 
architecture also, referred to as Heritage (or Historical) BIM / HBIM. Currently, the most established HBIM 
workflow consists of surveying the building by point cloud used as a reference to reconstruct simplified geometries 
employing BIM objects, which are assigned attributes and linked documents, boards and databases helpful in 
enriching the model with information produced and used by specialists in the field (Logothetis et al., 2015; Lopez 
et al., 2018; Pocobelli et al., 2018). 


If, in the case of the design of new buildings, the definition of accurate and complete documentation of what is to 
be built goes hand in hand with the progressiveness of the definition of the project itself, and is, therefore, fully 
consistent with the information practices imposed by BIM environments, in the case of the processes that 
characterize the activities of investigation and restitution of an existing asset the knowledge we have of the latter, 
in addition to coming from extremely heterogeneous sources, is subject to continuous changes, interpretations, 
uncertainties and gaps that must be able to persist until the end of the process and beyond (Bianchini, 2014). In 
fact, information management in this field is still mainly based on a documentary approach (Moscati, 2021) in 
which information is stored by presenting it in a linear and orderly manner but in a flat and static form that does 
not allow the possibility of movement and connection between the information itself. The need to overcome these 
limitations led to the research of new systems that could allow machines to automatically combine knowledge 
from different sources and, even better, derive new knowledge from them. 


Following linked data principles, ontologies are languages used by semantic web resources to represent knowledge 
and concepts within a specific domain (Gruber, 1993). In the case of cultural heritage and more in detail built 
heritage fields, ontologies have been developed to organize and structure information related to buildings, 
historical monuments, archaeological sites, and other forms of architectural heritage. These ontologies enable 
better management, search, sharing, and data interoperability in the built heritage field. 


The direction in which major international research infrastructures are moving (DARIAH, Etc.) seeks to transform 
the 'Web of documents' into a 'Web of data', consisting of concrete 'objects' that machines can process (machine 
understandable). The aim is to provide computers with the ability to combine data to create new knowledge, form 
new connections and draw new conclusions from indexed data, automating what has hitherto been the exclusive 
preserve of humans and minimizing the separation between discovery (locating bibliographic news) and delivery 
(finding the document) so that an ecosystem of metadata can be created. This operation will soon allow global data 
availability within a larger framework of openness and interconnection of information, enhancing the effectiveness 
and visibility of information available online. 


1.2 Existing ontologies for industrial processes representation 


Significant efforts exist to standardize ontologies across industries. Organizations such as Industrial Ontologies 
Foundry (IOF) (Drobnjakovic et al., 2022) and Manufacturing Enterprise Solutions Association (MESA) (Kazil et 
al., 2020) are collaborating to develop common ontologies and data models to improve interoperability among 
manufacturing systems. To date, the mid-level manufacturing ontologies available are the Manufacturing 
Semantics Ontology (MASON) (Lemaignan et al., 2006) and the Manufacturing Reference ontology (MRO) 
(Usman et al., 2013). However, they are only able to represent the production and design part of the manufacturing 
domain (Sanfilippo et al., 2021) and have limited mutual interoperability (Francesconi et al., 2010). 


The Supply Chain Reference Ontology (SCRO) (Ameri et al., 2020) is a pilot ontology that extends BFO and IOF 
Core able to provide the basic ontological constructs needed to represent a supply chain in terms of structure 
(members and their roles, functions, capabilities, relations, and resources) and operations (processes and flow of 
material and information). Similarly, the Supply Chain ONTology (SCONTO) (Vegetti et al., 2016) formally 
describes a supply chain at various abstraction levels. Resources such as workstations, machines, tools and fixtures 
are formally represented in the Manufacturing Service Description Ontology (MSDL) (Ameri & Dutta, 2006). 


Furthermore, with the advent of the fourth industrial revolution (Industry 4.0) and the expansion of the Internet of 
Things (IoT) into the industrial sphere, ontologies become even more important for organizing and understanding 
data from an increasing number of connected devices and intelligent systems (Sampath Kumar et al., 2019). 
Industry 4.0 is mainly based on robotic agents that are responsible for performing the main operations in a smart 
manufacturing environment. The standardization of the knowledge representation is based on standard ontologies, 
which are the CORA Ontology for robotics and automation (Prestes et al., 2014) and ROA Ontology (Cheng et 
al., 2016) that defines the main notions of behavior, function, and goal. Focusing specifically on the manufacturing 
resources, in literature we can find a variety of standard and ontologies that understand resources differently. Based 
on industrial standard, resources are defined as “ any device, tool and means, except raw material and final product 
components, at the disposal of the enterprise to produce good and services” (ISO 15531-1; ISO, 2004a). “The 
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types of resources involved in the manufacturing operational management are: personnel, material, equipment” 
(IEC 62264; IEC, 2013); “Means used by an activity to transform input into output” (ISO 20534; ISO, 2018). As 
we can see the terminology relies on not aligned vocabularies and we can define three different approaches that 
have distinct views on manufacturing resources. The first one relates the manufacturing activities occurrences with 
the resources model (Sarkar & Sormaz, 2019), the second one connects the resource to the activities primarily 
(Sanfilippo, 2018), the last one presents the resources directly related to the agents goals (Sanfilippo, 2018). The 
user case needs should rely on adopting one or the other. Furthermore, as we can see in the literature, many studies 
and approaches exist for contemporary manufacturing processes and Industry 4.0. At the same time, it lakes in the 
representation of historical industrial processes, therefore in the ontology reuse process, we adapted existing 
classes, relations and proprieties to our specific use case. 


1.3 Existing ontologies for cultural heritage representation 


Several knowledge structures have been developed to represent and manage cultural heritage data (Doerr et al., 
2020; Hellmund et al., 2018). For instance, the Getty Foundation developed a SPARQL version of its thesaurus 
(Harpring, 2010). In addition to those mentioned above, one of the most widely used conceptual models to 
describe cultural heritage in general with extensions possibility to fit built heritage is the CIDOC Conceptual 
Reference Model (CIDOC-CRM) (Martin Doerr, 2003). CIDOC-CRM provides a conceptual model to describe 
cultural heritage in general but can be broadened to fit built heritage aspects. 


Extensions of CIDOC-CRM include the CRMba model (Ronzino, 2015) conceived to support the archaeological 
documentation of buildings with an emphasis on recording stratigraphic units and the evolution of the structure 
over time. Furthermore, other specific ontologies have been developed to address specific issues in built heritage, 
such as conservation, documentation, and management. For example, Acierno et al. (2017) extend the domains of 
CIDOC with an ontological structure that covers aspects concerning both the typological and the constructive 
entities of historic buildings, and the documentation and investigation activities conducted by specialists for the 
artefact study and preservation. Colucci et. al. (2021) deal with the formal conceptualization to represent the built 
and architecture domain by proposing an ontological scheme that focuses on connecting semantic and geometric 
information to generate a parametric and structured model from point clouds. 


In a broader context, not confined exclusively to the ontologies field, initiatives such as the EUROPEANA 
(Europeana, 2017) and ARCHES (Myers et al., 2016) projects aim to create open digital infrastructure to enhance 
the semantic representation of data related to built heritage. It is clear that the development of ontologies for built 
heritage is an active field of research and development. New ontologies and approaches continue to be proposed 
and developed, especially to integrate ontological approaches with 3D and HBIM models (Cursi et al., 2022) to 
improve the representation and interoperability of information related to this domain. 


1.4 Digital approaches for classical and industrial archaeologies 


In the archaeological field, the structures and the architectural elements are usually partially visible, restored or 
modified, often with missing parts and multiple transformations during time. Digital representation is a big 
challenge since the archaeological reports on the excavations and related documentation are usually the only 
available data. The models tested in this field primarily deal with virtual reconstruction, starting from different 
survey techniques to immersive and virtual fruition. Most of the case studies considered are as-built BIM to 
represent the actual state of the remains with a BIM semantic structures through manual operations or automatic 
segmentation algorithms (Achille et al., 2015; Bosco et al., 2019; Di & Wu, 2011; Guerrero Vega & Pizzo, 2021; 
Moyano et al., 2021; Scianna et al., 2021; Trizio et al., 2018). In Garagnani’s work (2016), they defined a process 
of information cataloguing ArchaeoBIM as useful for documentation and consultation and for analytical studies 
to accompany the reconstruction activity. The main focus of some of them is on the stratigraphic analysis in the 
HBIM workflows (Diara & Rinaudo, 2020) and the development of the HBIM cloud platform for archaeological 
analysis and documentation (Diara & Rinaudo, 2021). While in some other cases, these applications have been 
implemented for preventive archaeological projects, for instance in Banfi et al. study (Banfi et al., 2020) they 
developed an open-source BIM platform able to merge BIM sensors, monitoring, historic building information 
modelling and VR, with a specific focus on complex scenarios with heritage sites subjected to flood risks and 
water level changes. At the same time, Saricaoglu et al. (2022) defined a method for data-driven conservation 
actions alongside the decision-making process to integrate both levels of geometrical accuracy and the multi-level 
data for the interventions. 


For the industrial archaeology digital process in literature, we can find, from a larger scale, GIS approaches for the 
creation of databases, spatial analysis and visualization (He et al., 2015), to similar approaches related to the 
definition of as-built models, from UAV survey to BIM environment (Barrile et al., 2019). In particular, there are 
some cases where the main focus is the machine apparatus inside the former factories. The integrated survey 
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process of the main components of this architecture type is followed, in one case by an HBIM modelling with a 
deepening aspect of LOD (Currà et al., 2022), in one other with the creation of a virtual tour (Shults et al., 2019). 


Hence, the literature review shows a clear gap in documenting industrial archaeology and the historical 
manufacturing processes. Furthermore, the applications could be more extensive in the data representation and 
relation between different domains to address the possibility of making interdisciplinary concept correlations. 
These gaps represent the starting point for our study which tries to delineate an approach towards industrial 
archaeology and other archaeologies by considering its valuable aspects, defined in the next paragraph, and 
through the integration with semantic web technologies to overcome the evident difficulties in the investigation 
process of a complex heritage site. 


2. COMPLEX ARCHITECTURES OF MULTIPLE ARCHAEOLOGIES: 
RESEARCH, DOCUMENTATION AND INVESTIGATION PROCESSES 


Complex architectural sites, defined by the stratification of multiple archaeology, are a peculiar heritage field 
rooted in the Lazio region. The Sanctuary of Hercules and the former Segrè Paper Mill site in Tivoli — selected 
case study for our research - are unique evidence of the past Roman empire that crossed centuries up to industrial 
time. In fact, the actual state is a combination of classical and industrial archaeology. In this case, the main focus 
of the study regards the semantic definition and knowledge modelling of the industrial processes of the Segrè 
papermill, which is the latest industrial reuse of the Sanctuary, built for some parts of the former manufacturing 
structures such as the ironworks and the powder magazines. This area stands mainly between the podium and the 
northern portico, made of iron and reinforced concrete structures. Although quite invasive, the different reuses 
over the centuries contributed to preserving the porticos dated II century B.C. A second expansion occurred in the 
former ironworks, where another paper machine expanded the production cycle (Cairoli & Ten, 2016). The plant 
had an automated production, and hydroelectric turbines followed later on by an electric cabin serving it. The 
whole complex was decommissioned in 1956, and today, it is partially reused as a museum. 


The factory, machines, objects and documents are explored as part of a system that has historically, socially and 
economically determined the territory and has shaped the landscape. The evidential value reflects activities that 
had and continue to have profound historical consequences and is based on this evidence's universal value 
(TICCIH, 2003). Moreover, the social and cultural aspects are related to an industry, a specific company, an 
industrial community or a particular trade or skill. It may also carry technological and scientific value in 
manufacturing, engineering and construction history or have aesthetic qualities deriving from its architecture, 
design or planning (Douet, 2016). Therefore, documenting the industrial processes represents a way to enhance 
the possibility of continuously perceiving its values and to highlight, in this specific case, how the multiple 
stratifications and different industrial reuses have helped maintain the ruins of classical archaeology for centuries. 
The purpose is to give valid support in the digital processes for built heritage by leaving intact the specific features 
and historical evidence of different periods, actively working to enlighten these diversities during the intervention 
and valorization activities. It is not a simple and pure conservation process since the past transmission proceeds 
through its continuous re-interpretation and through a global and interdisciplinary approach. 


3. KNOWLEDGE BASE FOR HISTORICAL ARCHITECTURE AND INDUSTRIAL 
PROCESSES 


3.1 Ontology development methodology 


The first step to develop the knowledge-based system is choosing the most appropriate methodology that best 
addresses the specific case study. As described in paragraph 2, it is necessary to define and consider representation 
from multiple domains, in particular, it is compelling to highlight how the whole process interacts with the existing 
and new building spaces, related to different evolution phases and how it is combined with the other archaeology 
and study fields. Some ontologies already cover parts of the domains of interest, while in other cases, integration 
of new representations schemas is needed to provide a full coverage of the information for a complete 
comprehension of the artefact. 


In the ontology engineering discipline and, more recently, in its application in the AECO field, the Linked Open 
Terms (LOT) methodology has recently been introduced and developed from the NeOn methodology (Suarez- 
Figueroa et al., 2015), which places particular emphasis on the reuse of existing ontologies - both general and 
domain ontologies - already prevalent in the relevant industry, fostering interoperability and optimizing the 
definition of concepts, attributes and relationships. In a field as complex as multiple archaeology, the potential 
benefits of such a methodology are clear: the presence of different disciplines involved, seemingly far apart, with 
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their terms, concepts and general knowledge structures, requires, first of all, an effort of correlation between the 
various domains and, secondly, the work of building the missing knowledge networks or adapting the existing 
ones. In the specific object of this study, for example, the reuse and integration of ontologies dedicated to the 
representation of construction and buildings, those dedicated to the documentation of cultural heritage, and those 
used to describe industrial processes and/or production facilities are important. 


3.2 Ontology requirements specification 


The use case and purposes concern the documentation of archaeological and industrial heritage to support the 
investigation and knowledge processes for the intervention and recovery of complex palimpsest. The data 
exchange needed for this study is based on different sources. All the information necessary for this study has been 
gathered from many sources starting from a historical and archival study composed of written sources and 
cartographic, iconographic and photographic ones from the two central archives (Segré family and Emo Salvati). 
Besides these historical studies, it is essential to combine them with the archaeological ones, that is, the analysis 
of all the remaining on surfaces and excavation necessary to reconstruct the whole industrial process. The data 
acquisition through an integrated survey is then combined with the information collected through the other sources 
to get a complete information framework. 


The non-functional requirements, which refer to the characteristics, qualities or general aspects, can be identified 
by defining a system able to manage heterogeneous data from multiple domains, to assure flexibility, adherence 
and coherence. Furthermore, consistency should be considered to avoid duplicate information or over-constrained 
property assertion. Extendibility can be achieved by using standardized languages and existing patterns and 
concepts for ontology development (Suarez-Figueroa et al., 2009). For scientific data management, the ontology 
should also be findable, accessible, interoperable and reusable. 


The functional requirements are the ones related to the use case. The documentation of industrial processes 
requires the definition of the activities performed in the process stages and the different resource types used 
(machines, humans, raw materials, semi-finished products). A detailed description of each production stage is 
necessary for a comprehensive understanding of the process, the machines and the humans/workers, all valuable 
and fundamental aspects for documenting and investigating industrial archaeology. Along with defining the 
manufacturing activities, the ontology needs to determine the spaces or rooms where the activities used to be 
performed, in terms of architectural and technological aspects through spatial and technological entities and the 
relation between production process, machines and building components and spaces. Layered archaeologies are 
complex sites where interpretation is usually a complex activity. Therefore the representation of the evolution 
phases should concern the whole complex and single building elements to interconnect these three domains of 
interest in an interdisciplinary way by covering all the knowledge useful for the intended documentation activity. 


3.3 Ontology conceptualization 


From the sources listed in paragraph 3.1, the first attempt in the conceptualization work was to delineate the 
complete cycle highlighting classes or concepts necessary to describe the whole production system. 


In 1935 the mill was expanded, presumably introducing this cycle. The raw materials used for paper making were 
rags and wood logs. The process based on the rags, our main focus in this case, started from the arrival at the 
factory in bales where they were stacked temporarily in large deposits. The processing of rag consisted of cleaning 
and sorting according to various qualities and cutting them in special cutting machines. Once they were selected 
and separated, they were cleaned a second time by tumblers and beaters, then shredded and leached under pressure 
with lime or soda in spherical or cylindrical kettles. These operations were followed by fraying in Hollander piles, 
sort of tanks filled with water where a cylinder with knives reduced the rags into filaments. After that, the produced 
“half pulp” was conveyed in special tanks and subjected to bleaching with calcium chloride before being fed into 
the Hollander refining machines. In the Hollander refiners, the half pulp was beaten by other rotating cylinders 
and converted to “all pulp”, ready to be transformed into paper. Thus the pulp was coloured and glued with resin 
soap to give texture and to prevent ink from spreading on it. In the paper machines, the pulp was stretched into a 
thin layer of thick water on a wire cloth, which was then detached into a veil and it was passed over a long woolen 
felt around a series of huge drying cylinders, heated by steam, and finally dried, it was rolled up as a raw product . 


The long ribbons produced by the paper machines were directly pressed and smoothed by giant calenders and then 
passed to the rolling machines. In the preparation rooms, the paper was cut into sheets and reams were assembled; 
in the end, the sheets were sorted one by one to be packed and shipped or for other finishing works. 
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Fig. 1: Use case data of the latest production process of the Segrè papermill. 


Fig. 2: Spaces, activities and machines in the latest Segrè papermill production process. BIM model carried out 
by Andrea De Pace and Riccardo Rocchi under the supervision of Edoardo Currà. 


Based on the studies conducted by Sanfilippo et al. (2021), we detailed the definition of the process plan, the set 
of sequential activities decomposing the process, the input component or materials for each activity and the 
requirements to execute each activity. As shown in the figure, table 1 defines the critical aspects of the use case 
by listing the activities and their description, input components of each activity, and resources needed to execute 
each activity, which can be subdivided into manual and automated resources. 


After that, it was necessary to identify key concepts to relate the papermill cycle with the building spaces where it 
used to be performed, which is represented in the first column of Figure 1. The spaces and the building components 
are required to describe the different halls built in overlapping periods of time. For this reason, it is fundamental 
to briefly sum up the significant historical stages of the investigated area and, consequently, the evolution of the 
construction methods related to the different phases. As we already have introduced, the latest industrial production 
was installed in the northern portico and sacred area of the Sanctuary of Hercules. Over the centuries, many 
different productions and constructions intersected. In particular, in the 1920s, the mill already had three paper 
machines, with a layout maintained until its dismission. Between 1923 and 1926, took place the elevation and 
expansion of the paper machine room in the sacred area. Thanks to the new technologies, a radical change occurred 
in the 1930s through the widespread use of reinforced concrete structures and prefabricated frames for large roofs. 
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In this period, the rag room was renovated and expanded, preceding the construction of a paper storage room and 
a paper sorting room on a two-level building. During the war, the roofing of the sacred area rooms suffered 
substantial war damage. Three wooden truss roofs were replaced with reinforced concrete structures and two barrel 
vaults with SAP technology. Overall, there was no real expansion but a simple consolidation of the overall 
preexisting layout. In Figure 2 some of the spaces/areas, activities performed and machines are represented in the 
informative model. 


Once the main domains, concepts and classes are described, the following step of this study, detailed in the next 
paragraph, concerns the research and reuse of existing ontologies and the encoding activity. 


3.4 Ontology reuse and encoding 


The three domains considered for this case study are the Production Process, the Architectural Artefact and the 
Historical Evolution and they are all connected since the process and the manufacturing tasks are performed in 
specific spaces that belongs to different areas or rooms of the site which are built in various phases of the papermill 
production. After reviewing the existing ontologies for all the listed domains we used for the Building spaces and 
elements the IFC Standard, while for the production process we chose an extension of the IFC and the FA ontology 
that is mainly focused on the relation between the used resources and activities performed. For the historical 
evolution documentation we based the whole reconstruction process on the CIDOC-CRM (Fig. 3). 


‘Production Process ~ . a Architectural Artefact 
Ifc:https //standards.buildingsmart.orgAFC/DEVAIFC2x3 7 . i 
/FINAL/OWL ifc-Process ; ifc:SpatialStuctureElement 
ifcext: http//www.ontoeng.com/IFC4_ADD1_extension# i a : pan i 
fa: http:/Avww.ontoeng.com #actory# EEE 
cidoc-crm: http:/Avww.cidoc-crm .org/cidoc-erm ' fa:ManufacturingTask [1a:isPerfomedin >| ife:Space 
IA ispartof 


Domain of interests i 
cidoc:E77Persistentitem 


ns:class | Class 


cidoc:E18Physical Thing «~~ 
<———_ nsccobjectProperty ys g ; 


<+ <<rdfs:subClassOf>> 
— - Historical Evolution 


Fig. 3: Knowledge modelling of the three domains considered: the production process, the architectural artefact 
and the historical evolution. 


The chosen ontologies for the production process modelling provide an extendible representation of the factory 
entities related to production systems, resources and products. The starting point was a simplified version of the 
IFC based on two main classes. A process (ifc: IfcProcess) "is an event that transforms input-output products while 
making use of resources under specific control rules. A resource (ifc: [fcResource) represents an entity that is 
needed to perform the process. In turn, various objects, such as physical products, people, and materials, can be 
employed as resources". Then we integrate this approach with the one proposed by Sanfilippo et al. (2021). They 
started considering three existing high-level approaches and trying to unify them in a single and complete 
framework. In this way, it was possible to consider and represent manufacturing by taking into account activities, 
goals and activities occurrences during the entire process since the IFC-based modelling is not able to distinguish 
alone between the entities and their relations. On the other hand, through this integrated methodology, the 
properties are explicitly stated, and they can provide a useful explanation of the case study. For instance, the 
activity description is a class that describes the specific manufacturing process description where whitewashing is 
an individual and refers to conveying the half pulp in special tanks and subjected to bleaching with calcium 
chloride. The input components for each activity list the components needed for that singular operation and they 
are detailed through the property hasComponentReg. For the activity whitewashing, the input components are half 
pulp and calcium chloride, and the output of the same activity, which is expressed through the propriety hasOutput 
is the whitened half pulp that is combined with other components and becomes the input resources of the next 
activity. The activity requirements list the resources and capabilities that are necessary to perform specific tasks. 
For instance, the whitewashing activity hasResourcesReq tooling and handling that are individuals of 
CapabilityDescr. The Resource in Automatic solution and the Resource in Manual solution list the description of 
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resources either in the automatic or manual configuration that can execute the activity; Whitewashing Hollander 
machine and Operator satisfy the two resource requirements and are correspondingly individuals of ArtifactDescr 
and Descr. 


The production process domain and the building artefact domain are connected since the fa: Manufacturing Task, 
a subclass of Ifc: Process, isPerformedin a specific place of the papermill complex. In fact, the architectural 
artefact knowledge modelling started from the class [fc:JfcSpace which is defined as "an area or volume bounded 
actually or theoretically. Spaces are areas or volumes that provide for certain functions within a building". The 
main classes considered are both subclasses of the Ifc:IfcProduct, in detail, the [fc:IfcBuildingElement indicates 
"all elements that are primarily part of the construction of a built facility, i.e., its structural and space separating 
system. Building elements are all physically existent and tangible things". The latter class was useful to describe 
the building components, and it was divided into Load Bearing Skeleton itself subdivided into Steel Frame and 
Concrete Frame; Horizontal Closures divided into Horizontal Base Closing, Horizontal Intermediate Closing and 
Horizontal Top Closing that can be Plane, Sloped or Curved; and Vertical Closures. For instance, considering the 
manufacturing task whitewashing, it was performed in the Hollander Beater Room, an individual of Ifc:IfcSpace, 
and it is enclosed in Industrial floor! as Horizontal Base Closing, and Barrel Vaults as Horizontal Curved Closing, 
which replaced the Horizontal Sloped Closing Wooden Truss Roof that was damaged during the second world war. 
For the Vertical Closing, we identified two different closings instantiated as Tuff Wall 1 and Tuff Wall 2, probably 
built in two different periods, and they have incorporated the structures derived from the Hydroelectric and 
hydraulic canals Construction. 
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Fig. 4: Instantiation process of the whitewashing activity performed in the Hollander Beater Room. 


Moreover, the building construction phases are very intricated due to the multiple overlapping structures, and 
therefore a clear explanation is necessary for a deep understanding of the artefact elements. For this reason, the 
Spaces defined in the architectural artefact domain are connected with its historical evolution through the 
property ispartof, that associates the Ifc:IfcSpace with the E18 class of the CIDOC-CRM PhysicalThing. As 
explained in paragraph 1.1, the CIDOC-CRM provides a common and extensible semantic framework for 
evidence-based cultural heritage information integration. The classes used are E18 Physical Thing appropriate for 
the definition of different rooms where the production process took place, i.e. the “Half Pulp” processing area. 
The E18 is a subclass of the E77 Persistent Item, which defines items with a persistent identity, sometimes known 
as “endurants” in philosophy. For example, in our case, it could be interpreted as New Warehouses Definition, 
considering the latest papermill production. E66 Beginning of Existence is instantiated as Papermill 
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Construction and it is connected with the E77 Persistent Item through the propriety hasBroughtintoExistence and 
through the property hasTimeSpan it is related to E52 Time-Span instantiated as 1900s. Those classes are 
fundamental to documenting a specific state of the artefact, and we also considered other CIDOC classes to 
document the transformations that modified the physical structures and also functions. To express the 
transformation, we considered E81 Transformation, instantiated as Adaptation to Papermill Production. Other 
more detailed classes and proprieties can be helpful, such as E11 Modification and its subclasses E79 Part Addition 
and E80 Part Removal, which can be directly related to the E18 Physical Thing through the propriety hasModified. 


In Figure 4 is represented a small portion of the instantiation process which regards one of the activities performed 
in the Hollander Beater Room. All three domains are represented and are interconnected with each other. Each 
single manufacturing task is then connected with the following tasks through the input components defined as 
Component requirements and the outputs which are the input of the subsequent activity. The tasks are then 
connected with the spaces and the historic evolution domain as explained above. As defined in the picture, we 
combined classes from existing ontologies with integrated classes, we represented the individuals of each class in 
red, correlating them with object properties. 


4. INTEGRATING KNOWLEDGE-BASED REPRRESENTATION AND HISTORIC 
BUILDING INFORMATION MODELLING 


The knowledge formalization related to the latest papermill production, which includes the disciplines presented 
above, is then integrated with geometrical and technological aspects of the artefact components. The first 
connection between the knowledge structure and the informative model has been made through the correlation 
between the /fc:IfcSpace of the knowledge base and the /fc-JfcSpace of the BIM model. The IfcSpace is defined, 
for our case study, as the rooms or areas where the activities were performed. The integration of the two models 
is a critical process. It is necessary to carefully decide which information can better connect these two 
environments to ensure interoperability and data alignment. 


In the BIM environment, each room has a label, for instance, Paper Machine Room or Hollander Beater Room, 
individuals of the /FCSpace. The label correspondence is between the name of the room in BIM and the individuals 
of the knowledge base Paper Machine Room or Hollander Beater Room, instances of the [fcSpace Class. 
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Fig. 5: Label correspondence between the rooms in the model and the individuals in the knowledge base. 


The label corresponding process is represented in Fig. 5, through this integration activity is possible to underline 
a fundamental aspect of this approach. In fact in this way, we can connect the entities with other concepts, abstract 
or concrete which belongs to different domains in a cross disciplinary way. Using only the BIM representation 
schema is quite limited since many relations and concepts cannot be fully interpreted and represented. This 
approach can answer some of the critical issues related to the current practices in the digital built heritage 
processes. The documentation of the artefact requires the representation of a broader domain of knowledge, which 
includes extremely diverse and specific aspects such as historical, cultural contexts, construction techniques, and 
the history of materials which guide its accurate interpretation. Therefore it is possible to represent and manage 
information from an incremental and recursive perspective and from heterogeneous sources. The complex nature 
of built heritage fields needs appropriate tools and approaches to better address the multiple and open issues. 
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While in the first case, the label correspondence was between spaces and rooms, in this second case, we focused 
on the building elements. The ifc schema has a rigorous hierarchy representation, and sometimes to represent the 
heritage buildings, we are forced to use this schema even if it does not address the singularity and uniqueness of 
these artefacts. The building components of a historical building may not precisely correspond to specific classes 
in the ifc structure, especially when considering overlapping structures where every element may have acquired 
different meanings and functions over time, embedding a wide range of historical, cultural, social and 
technological values. In Figure 6, we can see the label correspondence between the two models by considering 
some of the building elements. For the architectural artefact representation, it was included in the knowledge base 
another class level as a subclass of the ifc:-BuildingElement defined by Ja:HorizontalClosures subdivided 
in la: HorizontalBaseClosing and Ia:HorizontalSlopedClosing, and la: VerticalClosures. The instantiation process 
represented within the red boxes in the diagram is conformed to the individuals in the BIM model, which are 
likewise instances of ifc: Wall, ifc:Slab and ifc:roof. 


The possibility of correlating concepts outside the informative model helps us to represent the authenticity of the 
heritage artefacts, the overlapping structures of the building components, the inconsistencies, and the different 
interpretations of single or multiple elements. These are all steps of the documentation and investigation processes 
that all the specialists perform in their field and, through this integrated approach, can be represented consistently 
in a machine-readable way. 
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Fig. 6: Label correspondence between building elements in the model and the individuals in the knowledge base. 
BIM model carried out by Andrea De Pace and Riccardo Rocchi under the supervision of Edoardo Currà. 


5. CONCLUSIONS 


This research focuses on modelling the main features of industrial archaeology in a multi-layered historical site. 
We developed a structured knowledge base integrated with the BIM model to document the latest papermill 
production of the Segrè family. The historical industrial process and the manufacturing machines have shaped over 
time the architectural assets built on the remains of the Roman sanctuary and the former iron and powder 
productions. The knowledge base creation started by defining the domain of interests: production process, 
architectural artefact and historical evolution, followed by the knowledge base creation through an ontology 
conceptualization, reuse and encoding activities. Finally, the last part deals with integrating the knowledge base 
with the informative model made through a label correspondence between the same IFC classes. 


The novelty of this integrated approach addresses some issues presented in section 1. Starting from a clear gap in 
the digital documentation process of the industrial heritage field, we proposed a methodology for the knowledge 
representation that considers all the domains of interest, with a specific focus on the industrial process, to customize 
the existing ontological approaches for contemporary manufacturing and Industries 4.0. Over time, the aggregated 
work on multiple industrial heritage sites can extend the queries on a large amount of data and scale to define 
similarities through the use of computable knowledge, highlighting differences and multiple interventions and 
recovery actions based on regional and transnational cases of industrial archaeology. 


The application and validation of the proposed approach in similar contexts are necessary to highlight 
improvement aspects, extend the concepts and relations of the considered domains, and implement other valuable 
domains. Moreover, the difficulties faced during the knowledge acquisition process, caused by data inconsistency 
and uncertainties, could be managed by new ways of knowledge representation systems able to document and 
grade multiple interpretations schemas. Indeed, the following step of this research is to overcome the above- 
presented limits of this approach to better represent and manage the built heritage knowledge. 
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ABSTRACT: Assessing the structural integrity of unreinforced masonry structures is a complex and time- 

consuming process that necessitates the knowledge of various experts and meticulous cross-referencing of diverse 
data to achieve a comprehensive understanding of the building. In recent years, the Architecture and Construction 

Industry has witnessed a digital transformation, largely driven by Building Information Modeling (BIM). BIM has 
proven immensely valuable in the conservation of historic buildings. However, while it excels in new construction 

projects, its full potential is not fully realized when dealing with existing structures. A clear example of this 
limitation can be observed in the Industry Foundation Classes (IFC) format, which lacks instances necessary for 
accurately representing existing building features. This research contribution aims to advance the process of 
semantic enrichment of BIM for existing buildings, building upon findings from existing literature. Leveraging the 
Linked Data Approach and utilizing both existing ontologies and newly proposed domain ontologies, the objective 
is to facilitate the identification of vulnerabilities and potential local failure mechanisms. The geometric 
information of the building is represented in the IFC STEP format and enriched semantically by establishing new 
relationships between classes that are not present in the standard IFC. This approach is applied to a case study in 

the historical center of Castelnuovo di Porto, Italy. The results of this work demonstrate how the proposed model, 

enhancing the BIM representation of existing buildings and enabling better identification of potential weaknesses, 

contributes to improved preservation and seismic resilience of historic structures. 


KEYWORDS: BIM, Linked Data, Semantic Modeling, Historic Constructions, Structural Masonry. 


1. INTRODUCTION 


Before the advent of modern construction techniques, buildings were raised employing local materials and 
construction methods, resulting in a large percentage of the built heritage being composed of unreinforced masonry 
structures. The effectiveness of unreinforced masonry constructions depended on adhering to a set of empirical 
rules known as the ‘rule of the art (Antonino Giuffré et al., 2010). This aspect becomes particularly critical in 
seismic scenarios, as past seismic events demonstrated that following the 'rule of the art! ensures walls exhibit a 
monolithic behavior, fundamental to being resistant to earthquakes. Furthermore, the proper connection between 
structural elements is another critical factor that helps prevent out-of-plane local failures. These failures can happen 
when walls or portions of walls collapse outward during an earthquake, posing a significant danger to both the 
structure and its occupants (Antonino Giuffré, 1993; Antonino Giuffré et al., 2010). 


The Italian Code, which is a major reference for the assessment of historic buildings, provides three levels of 
analysis of the structural behavior of existing unreinforced masonry buildings: (i) Identifying the shear strength of 
the masonry under examination, (ii) Verifying local mechanisms, and (iii) Conducting global numerical analyses. 
(Norme Tecniche per Le Costruzioni, 2018). The three levels of assessment become increasingly more 
comprehensive, with their accuracy contingent on the modeling assumptions. Consequently, opting for the simplest 
level of assessment would be preferable when knowledge is limited. 


Considering these principles, a precise evaluation of the structural behavior of existing non-reinforced masonry 
buildings, utilizing more advanced methods, necessitates a thorough examination of the structure, involving 
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experts from diverse fields. (ICOMOS, 2005). Consequently, a systematic methodology is needed to allow the 
integration of data of different types, avoiding the repetition or defeat of pivotal information. 


With the introduction of Building Information Modeling (BIM), the architecture and engineering industries have 
significantly changed their processes. This technology, although it was developed for the construction of new 
buildings, has not gone unnoticed in the field of rehabilitation of historic buildings. Today, the term HBIM (Historic 
Building Information Modeling) identifies the application of BIM technology to historic buildings (Maurice 
Murphy et al., 2009). The HBIM methodology has been explored as support to various areas of conservation 
(Pocobelli et al., 2018; Volk et al., 2014). Relevant efforts have been done to improve the representation of complex 
geometry, mainly with the integration of advanced survey acquisition methods (Cotella, 2023). HBIM applications 
exploit the potentiality to map damage and deformation accurately (Barontini et al., 2022; Moyano et al., 2022), 
conduct simulations (Gigliarelli et al., 2017; Ursini et al., 2022), manage the intervention on site (Biagini et al., 
2016) and optimize facility management (Piselli et al., 2020). 


Despite its widespread use, HBIM encounters challenges in its application, mainly because the original BIM 
methodology was primarily introduced for new construction projects. In reality, even the use of Industry 
Foundation Classes (IFC) needs to be enhanced to digitize existing constructions. Consequently, the semantic 
enrichment of HBIM models has emerged as an increasingly researched area. Due to the multidisciplinarity of the 
conservation field, the use of Semantic Web Languages through a Linked Data approach is gaining momentum 
(Cursi et al., 2022). The advantage of this methodology is that it allows modeling domain-specific information 
using specialized ontologies, which can be employed as external links to enhance the content of the BIM models. 


From a broader perspective, the use of semantic web standards such as Resource Description Framework (RDF) 
and Web Ontology Languages (OWL) has the advantage of providing interoperability between data of different 
domains which are published on the web. On the other hand, IFC is written in EXPRESS language, and has a 
strong emphasis on the tridimensional representation of the geometry, while remaining difficult to integrate with 
other web sources (Rasmussen et al., 2020). 


The ifcOWL ontology has been a pioneering attempt to extend the content of IFC to the semantic web (Beetz et 
al., 2009). However, due to its extensive length and complexity, it becomes challenging to implement and utilize 
in practical applications. As an alternative, other more contained ontologies have been proposed to represent 
construction instances in the semantic web. Under this approach, spaces are defined using the Building Topology 
Ontology (BOT) (Rasmussen et al., 2020), building elements with the Building Element Ontology (BEO) 
(Pauwels, 2018), and materials with the Material Property Ontology (MAT) (Poveda-Villalon & Chavez-Feria, 
2020). The tendency of modularization of information based on different domains also interested the 
BuildingSmart Technical Room. Indeed, for the next generation of IFC, it is proposed to have a common base 
layer, connected to several extensions belonging to different domains (Berlo et al., 2020). In addition, the IFC base 
schema will be language-independent, ensuring greater interoperability with formats currently in use in other 
fields, including RDF. 


In the past, there have been proposals for large and complex ontologies to represent historical data. The CIDOC- 
CRM for instance (Crofts et al., 2003), has been mainly developed for museums, but then extended to other 
domains such as the representation of non-destructive testing techniques (Kouis & Giannakopoulos, 2014), 
annotation of degradation phenomena of stones (Veron et al., 2015), and also for the semantic enrichment of HBIM 
models (Acierno et al., 2017). However, currently also in the field of historic constructions, there has been a recent 
preference for using a network of modular ontologies instead of a single complex ontology (Bonduel, 2021). This 
facilitates the better management of the ontology and the connection with different domains. 


This paper aims to propose a method to digitize current methods for structural assessment of existing unreinforced 
masonry buildings. The purpose is to improve the management of the alphanumeric data associated with three- 
dimensional models, stressing standardization and interoperability. Two new domain ontologies are proposed: (i) 
Historic Masonry Ontology (HMO); (ii) Failure Mechanism Ontology (FMO). The first represents masonry 
material, while the second represents the vulnerabilities associated with specific types of masonry collapse. The 
two ontologies can be used together or combined with other domains. In particular, they can be used for the 
semantic enrichment of BIM models, combining geometry representation and alphanumerical data. This is 
demonstrated using a web app to map IFC and Turtle files (A. Donkers et al., 2023). 


This paper is organized as follows. After this introduction, the next chapter is ‘Materials and Methods’, followed 
by ‘Results and Discussion’, and ‘Conclusions’. 
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2. MATERIALS AND METHODS 


From a structural point of view, masonry is a heterogeneous material, constituted by units and joints. Units are 
bricks or stones which actively contribute to the load-bearing capacity and stability of the wall. Joints are the 
junctures between masonry units and can be either dry or filled with mortar. The most resistant masonry should 
have an arrangement of discrete elements such that monolithic behavior is ensured. 


When the wall exhibits monolithic behavior, structural simulations can be conducted by considering a 
corresponding homogeneous material. However, even in such cases, the wall's morphology needs to be assessed, 
to define the most appropriate modeling assumptions. Indeed, the Italian Building Code designates specific 
mechanical parameters of reference, considering the type, size, materials, and arrangement of units and joints. 


With these premises, it is evident that for the structural analysis purpose, it is necessary to provide a comprehensive 
representation of the masonry material, considering its heterogeneous features. In the already existing 
representation schemas, such as the IFC or the MAT ontology, there is no possibility of having such a detailed 
description. Wall instances are indeed associated with ‘material layers’, and to each layer, it is attributed and 
homogeneous or even heterogeneous material, but where the different material's constituents are not defined as 
classes. In this way, it is possible to represent discontinuities only along the cross-section of the wall, which is not 
representative of the typical configuration of masonry walls (Figure 1). 


To fill this gap, a new ontology was designed. This Historic Masonry Ontology (HMO), was conceived as a 
foundation ontology, to be used for the structural assessment of historic masonry structures, independently from 
the type of analysis (i, ii, or iii level). Then, based on the specific analysis level, other ontologies can be merged 
into the HMO. As a proof of concept, the Failure Mechanism Ontology (FMO) was subsequently developed to 
model the causes and consequences of failure mechanisms in masonry walls. The FMO is linked to the HMO 
ontology due to the relationship between masonry quality and wall vulnerability. 
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Figure 1. Morphology of an historic masonry wall compared to material modeling in IFC. 


Both the HMO and the FMO ontologies were integrated with existing ontologies, exploiting the interoperability 
of the Semantic Web Modeling. In particular, the connection with the BEO and the MAT allows a direct mapping 
between the semantic model and the BIM model in IFC. 


To link data between the model BIM and the model in Semantic Web Language, information is mapped between 
IFC model classes and corresponding ontology classes, using a common GUID. Damage elements do not have a 
direct equivalent in IFC, but can still be modeled as IfcElementy proxies and mapped to the ontology via GUID 
as well. 


As shown in Figure 4, building elements, damages, and materials serve as a bridge between the IFC and the 
semantic model, allowing geometries to be associated with the semantic model based on the HMO and FMO 
ontologies. 
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Figure 2- Methodology for the semantic enrichment of the BIM model 


2.1 Historic Masonry Ontology 


The Historic Masonry Ontology was implemented for the detailed modeling of masonry materials. Given the wide 
variety of masonry types, it was decided to propose a rather generic ontology that could be used to represent all 
types of masonry, regardless of units and mortar materials and morphology. 


The walls are modeled with the class hmo:Masonry Wall, which is a subclass of beo:Wall. The connection between 
the HMO and BEO ontologies is fundamental, both for interoperability between domains and semantic enrichment 
of the BIM models. An hmo:MasonryWall is defined by two data properties: (i) hmo:wallName and (ii) 
hmo:quality. The hmo:wallName allows the identification of different masonry walls in a human readable manner; 
the hmo:quality is intended to provide a qualitative description of the wall's quality to represent compliance with 
the rules of art in a synthetic manner. 


The masonry layers are modeled by the class hmo:MasonryLayer, which is related to hmo:MasonryWall by the 
property hmo:isLayer of, the inverse of hmo:hasLayer. Each masonry layer may contain one or more hmo:Patterns. 
A pattern refers to a specific section of the layer, characterized by well-defined units and joint types that can be 
easily standardized. 


The necessity for employing more than one pattern to describe a certain masonry type becomes especially critical 
when attempting to represent masonry types that involve various brick resources. These diverse brick resources 
contribute to the formation of walls comprising units with irregular compositions, interspersed with brick elements, 
as shown in Figure 1Figure 3. 
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Figure 3. Elevation view of an example masonry and corresponding classes in HMO ontology modeling. 


Units and joints can be modeled employing specific subclasses of the hmo:PatternEntity, which are hmo:Units and 
hmo: Joints. Joints are modeled as interfaces of the units, referring to the class bot:Interface. Units and joints present 
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specific features, modeled as data properties. These classes refer in a general way to all the units and joints of the 
wall, so general information, such as the maximum and minimum dimensions of the units, or the horizontality and 
verticality characteristics of the morrtar, are assigned. 


Both hmo:Unit and hmo:Joint henerith the mat:Materials from the superclass hmo:PatternEntity. The class related 
to the material to be associated with units and mortars does not need to be re-modeled in this ontology. In fact, it 
is intended to refer to and link to the MAT ontology. In addition, material characteristics can be linked to a database. 
This approach, derived from the Building Performance Ontology (BOP) (Donkers et al., 2021) developed for 
building performance assessment, is well-suited for masonry applications. The complexity of defining certain 
parameters leads us to rely on established Databases (Vanin et al., 2017). Finally, the ontology relates to the 
ontology of Damage Topology Ontology (DOT) damage, since the presence or absence of certain damage is an 
indicator of quality (Hamdan et al., 2019). 


Figure 4 presents the Historic Masonry ontology and the link with other existing ontologies. 
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Figure 4 - Overview of the Historic Masonry Ontology 


2.2 Failure Mechanism Ontology 


The ontology for Failure Mechanisms enables the modeling of expected failure mechanisms by modeling the 
vulnerabilities that cause them. 


The ontology consists of two primary classes, namely, 'fmo:Vulnerability' and 'fmo:FailureMechanism'. These 
classes are interconnected through the object property 'fmo:isFacilitatedBy', which establishes a relationship 
between mechanisms and vulnerabilities. A specific mechanism can be facilitated by one or more vulnerabilities. 


To account for the qualitative nature or a combination of qualitative and quantitative aspects in vulnerability 
descriptions, distinct sub-classes are defined for 'fmo:Vulnerability’. This approach enhances comprehensiveness 
by providing dedicated classes for different types of vulnerabilities. 


One of the subclasses within 'fmo:Vulnerability' is 'fmo:BadMasonryQuality', which relates directly to the 
previously described ontology. Defining masonry quality requires detailed description of its morphological 
characteristics. The object property 'fmo:isInfluencedBy' serves as a connection between the 'hmo' and 'fmo' 
ontologies. It is anticipated that other vulnerability subclasses can connect with various domain ontologies to 
address specific aspects. For example, determining whether floors are pushing and causing the presence of 
horizontal thrusts (e.g., 'fmo:HorizontalThrust'). 
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Within the ‘'fmo:FailureMechanism' class, several subclasses exist, such as 'fmo:InPlaneFailure’, 
'fmo:HorizontalBending', and 'fmo:VerticalBending'. These classes require enrichment with a set of properties, 
which can indicate the associated load conditions for a particular mechanism. Additionally, the 
‘fmo:FailureMechanism' class is related to the 'dot:DamageArea' class from the DOT ontology. This relationship 
acknowledges that the presence of damage may be a result of an ongoing mechanism. 


An overview of the complete ontology is shown in Figure 5. 
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Figure 5 - Failure Mechanisms Ontology 


3. RESULTS OF THE PRACTICAL APPLICATION AND DISCUSSION 


To concretely illustrate the application of the proposed methodology, a residential building located in the historic 
center of Castelnuovo di Porto was selected. Castelnuovo di Porto is a town in central Italy known for its rich 
heritage of historical buildings and a diverse range of architectural styles, including residential structures, 
churches, and public facilities that reflect the area's rich history and architectural evolution over the centuries. The 
choice of Castelnuovo di Porto as a case study was driven by its structural complexity and the need to address 
specific challenges associated with evaluating historic buildings situated in urban environments with dense 
historical and cultural value. 


The chosen methodological approach was particularly applied to a building that presented a set of structural and 
conservation challenges. This specific building, labeled as 'Wall 417_a' in the model, is part of a group of 
interconnected masonry structures, forming an architecturally significant complex. 


In the BIM environment, the construction was modeled using proprietary software, and exported according to the 
IFC schema. Load-bearing walls were modeled, adding windows and doors as simple holes, modeling arches 
where present. Damages were included using the [fcBuildingElementProxy class. These elements simply serve to 
visualize, in the geometric model, the location of the damage. In the IFC file, there are no identified alphanumeric 
properties for the damages, nor any taxonomic relationship with other building elements. Regarding masonry 
materials, in the IFC these are identified as homogeneous, to be associated with a certain MaterialLayer. 


A turtle file was created to proceed with the semantic enrichment. This particular application concentrates on the 
main facade of the building, referred to as "Wall_417_a,' which is modeled using the BEO ontology. The name 
attributed to the façade refers to the number of the urban parcel: 4/7, followed by the letter a as it is the first façade 
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assessed. The wall is further represented in the model as a hmo:MasonryWall along with its corresponding 
hmo:MasonryLayer. Due to the initial survey's limited accessibility, only the external layer was accounted for. 
Therefore, the focus remained on modeling the hmo- ExternalMasonryLayer and associating it with a hmo- Pattern. 
The details regarding the entities of the hmo-PatternEntity are shown in Figure 6. 


The structural damage is modeled as both dot:StructuralDamage and as a fmo:Symptom, since the presence of a 
crack could be a symptom of an out-of-plane mechanism. In detail, the presence of damage in one facade can be 
indicative of an out-of-plane mechanism occurring on the orthogonal facade. Through semantic modeling, this can 
be made explicit, as it was done for this model. 


The roof was added to the semantic model as a beo: Roof, modeled as a fmo:HorizontalThrust, which is a subclass 
of fmo: Vulnerabilities, since a 'pushing' roof can cause the overturning of a wall. Consequently, the mechanism 
was modeled as fmo:Overturning, associating the mechanism instance to (i) the wall where it occurs; (ii) the 
pushing roof that caused it, (iii) the structural damage which represents its symptom. 


The interactive mapping of the IFC model to the semantic model can be facilitated through web-based integration, 
leveraging JavaScript modules like IFC JS and COMUNICA. In this process, elements from both models are 
correlated using their respective GUIDs, allowing seamless cross-referencing between the two representations. 
This approach has been already proposed in the literature (A. Donkers et al., 2023) to query the information of the 
semantic model by clicking on the IFC geometry. 


Figure 6 presents a comprehensive overview of what is described above. Within the IFC model, the classes 
corresponding to the semantic model are visually indicated by distinct colors. Notably, the semantic model consists 
of a network of interrelated classes, showcasing its capacity to define relationships that surpass the limitations of 
the IFC model. An example of the query interface is shown as well. 
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Figure 6. Mapping the IFC and the Semantic Model. 


The employment of the Historic Masonry Ontology enables the modeling of diverse wall thicknesses while 
considering the patterns formed by units and joints. This domain ontology offers versatility and conciseness, 
accommodating various masonry typologies. Moreover, its seamless integration with established ontologies like 
BEO, MAT, and DOT enhances its utility, particularly in enriching IFC models with semantic data. Addressing 
the limitation of standard representations, it allows accurate association of material properties with specific 
masonry elements, such as bricks, stones, or mortar. 
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The adoption of semantic language within these ontologies results in enhanced interoperability, with the 
potentiality of extending into fields like chemistry and facilitating the assessment of material degradation. The 
possible interaction with databases presents valuable opportunities for deriving mechanical properties from data 
and integrating ontologies into practical applications, including inspections and monitoring processes. By 
integrating the Historic Masonry Ontology with the Failure Mechanism Ontology it was possible to consider, in a 
single semantic model, masonry characteristics and vulnerabilities. This comprehensive view allows for an 
objective definition of masonry quality by comparing qualitative and quantitative data. 


The systematic organization of masonry quality data with other relevant wall-related information, such as near- 
wall damage or the presence of lateral thrusts, is another innovative aspect. These data are presented as instances 
of classes, incorporating a range of characteristics. For instance, the damage is described not just descriptively but 
also as a potential symptom of an ongoing mechanism. Similarly, the representation of the roof as a structural 
element and a possible pushing element further enhances the objectivity of assessments. 


4. CONCLUSIONS 


This contribution belongs to the field of research debating the role that digital tools assume in the activities of 
investigation, documentation, representation, and analysis of the built heritage. In particular, the illustrated work 
proposes a workflow that aims to integrate the HBIM digital environment with an ontological structure, seeking 
to raise the semantic level offered by current digital models for the built heritage and in particular for the analysis 
of building systems from a structural point of view. 


In the last decade, various solutions for collecting, organizing, and managing cultural heritage information have 
given rise to a series of tools each with its database classification system, dedicated to representing a cultural 
artifact and its diverse contexts of investigation and interpretation. However, the same cannot be said of the built 
heritage, where on the one hand the complexity of the artifact itself and its historical evolution, and on the other 
hand, the presence of multiple disciplines in its processes of investigation, recovery, and intervention, have left 
the field effectively unexplored and lacking an organic approach to knowledge modeling. In this context, this 
article discusses the progressive adoption of two specific techniques - computer ontologies and the and HBIM 
models - highlighting their possibilities and ability to balance on the one hand the flexibility in dealing with the 
different disciplines involved on the other hand the rigor in information management necessary to effectively 
document the artifact. 


The real change, therefore, is not to be found in new models for cataloging and documenting the building and its 
aspects but, rather, in approaches capable of integrating and making consistent the different cognitive models of 
the built heritage, fostering mutual understanding and collaboration among the different skills involved in such a 
complex process as that of investigation and documentation. 


In this application, the primary focus was on the structural assessment of historic load-bearing masonry buildings. 
To achieve this goal, two new domain ontologies were developed, one for modeling masonry as a heterogeneous 
material and the other for defining vulnerabilities and related local mechanisms. This innovative contribution 
enables the semantic enrichment of BIM models using a Linked Data approach, effectively mapping the IFC and 
semantic model. 


The results obtained from applying the proposed methodology allowed for the identification of its strengths and 
existing open issues. Among the advantages achieved, the methodology provides a more objective basis for 
structural assessments by considering diverse modeling assumptions. Moreover, it offers potential benefits in 
training new preservation experts, as the inclusive representation of the structure, materials, and preservation state 
fosters a better understanding of the structural behavior in existing unreinforced masonry buildings. 


However, certain open issues require further development. One crucial aspect pertains to the integration of new 
domain ontologies into the methodology, especially for the development of global numerical models. This 
pioneering contribution lays the groundwork for future developments in this regard. Additionally, enhancing the 
user experience with the ontology is essential, and the creation of a platform that presents the ontology 
representation in the backend, capable of working with queries, and offering user-friendly interactions, would be 
highly valuable. 


Furthermore, applying the proposed methodology to a larger case study with more extensive data could yield 
additional insights and demonstrate further benefits. Such an expanded application would reinforce the 
methodology's effectiveness and real-world applicability. 


In conclusion, the development of the new domain ontologies and the application of the methodology present 
promising advancements in the field of structural assessment for historic masonry buildings. While it demonstrates 
numerous advantages, ongoing efforts to address open issues and explore potential enhancements will undoubtedly 
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contribute to the continuous improvement and adoption of this innovative approach. 
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ABSTRACT: Virtual reality (VR) offers promise as a tool for building performance simulations, especially when 
considering human-building interactions in buildings or spaces still under design. However, the absence of 
standardized data protocols impedes the consistent sharing of VR-related experiments and findings. This makes 
advancing VR experimentation as a reliable method for studying human-building dynamics challenging. The 
authors introduced the Virtual Human-Building Interaction Experimentation Ontology (VHBIEO) to address the 
challenge. VHBIEO seeks to standardize experimentation details as a domain-specific ontology, enhancing their 
interoperability. It includes essential experimentation concepts and employs semantic web technologies to ensure 
machine readability. Moreover, it integrates an application view (APV) to tailor details to specific experiments. 
Using VHBIEO-based metadata, this paper presents a case study aiming to standardize experiments that validate 
thermal sensations in immersive virtual environments (IVE), encompassing experimental protocol, variables, 
design, and data gathering. By exploring the main characteristics of VHBIEO-based metadata, the authors discuss 
its potential to improve the reliability of human-building interaction research. 


KEYWORDS: Ontology, Metadata, Human-building interaction, Occupant behavior, Virtual reality, Building 


1. INTRODUCTION 


Integrating virtual reality (VR) in studying human-building interactions has brought many novel opportunities to 
building design (Zhu et al., 2018). However, this integration is not devoid of inherent complexities. Foremost 
among these is the pressing requirement for rigorous standardization and systematic management of 
experimentation data. The present practice, characterized by non-standardized experimentation protocols, impedes 
researchers from optimally utilizing extant results and sharing their empirical findings efficiently. Therefore, 
advancing the reliability and validity of research within this domain necessitates the commitment to cultivating 
and adhering to the standardization of main experimentation data. 


The Virtual Human-Building Interaction Experimentation Ontology (VHBIEO) was designed to standardize data 
pertaining to virtual human-building interaction experimentation (Chokwitthaya et al., 2023). It was developed by 
extending the ontology of scientific experiments (EXPO) at the domain level (Soldatova & King, 2006). The 
construction of VHBIEO employed the DOGMA methodology, ensuring a detailed and interconnected internal 
structure (Jarrar & Meersman, 2008). This ontology incorporates terms and concepts from pre-existing ontologies 
and semantic models and emphasizes terminologies intrinsic to virtual human-building interaction experimentation. 
Notably, VHBIEO possesses attributes of machine readability, accessibility, and processability. Additionally, its 
structure integrates Application Views (APVs), facilitating the accommodation of distinct information tailored to 
specific applications. 


VHBIEO-based metadata thus contains the operational and application-specific information associated with VR- 
based experimentation. The metadata includes specific information about an experiment's components, such as the 
experimental protocol, design, setting, variables, and data collection procedures. The paper presents a case study 
showcasing the benefits of using VHBIEO-based metadata in retrieving virtual human-building interaction 
experimentation data. The case study specifically focuses on validating thermal sensation in an immersive virtual 
environment (IVE) and demonstrating how VHBIEO-based metadata can effectively capture and represent various 
elements of the experimental protocol, design, setting, variables, and data collection. The authors discuss how the 
metadata can improve the reliability of human-building interaction research by highlighting its main characteristics, 
including its use of the description logic, machine-readable, accessible, and processable, and inclusion of unique 
information through the application view (APV). 
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2. VIRTUAL HUMAN-BUILDING INTERACTION EXPERIMENTATION 
ONTOLOGY (VHBIEO) 


Virtual human-building interaction experimentation (VHBIEO) is an ontology developed specifically for the 
domain of virtual human-building interaction experimentation. It contributes to enhancing various aspects of the 
domain. First, it provides standardized information for VR-based experimentation, thus making information more 
consistent for researchers to share and reuse. Secondly, it allows for the production of machine-readable, accessible, 
and processable information. Finally, it aims at overcoming challenges caused by the diversity of experimental 
design and limitations of VR experiments. Therefore, VHBIEO can potentially accelerate the development of 
virtual human-building interaction experimentation as an emerging research approach. Furthermore, it can enhance 
collaboration between researchers, making it easier for them to share knowledge and build upon each other's work. 


VHBIEO extended EXPO (Soldatova & King, 2006) and reused terms and concepts from spatial-temporal event- 
driven modeling (STED) (Saeidi et al., 2018), the ontology to represent energy-related occupant behavior in 
buildings (DNAs) (Hong et al., 2015), ifcOWL ontology (Pauwels & Terkaj, 2019), the semantic sensor network 
ontology (SSN) (Compton et al., 2012), the survey ontology (SUR) (Scandolari et al., 2021), and the units of 
measurement ontology (UO) (Gkoutos et al., 2012). The development of VHBIEO followed well-defined ontology 
development approaches, namely ONTOLOGIES (Uschold & Gruninger, 1996), METHONTOLOGY (Ferndndez 
et al., 1997), Ontology Development 101 (Noy & McGuinness, 2001), and NeOn (Suarez-Figueroa et al., 2012). 
It comprised three major steps: initiation, construction, and evaluation. Competency questions (CQs) were used to 
regulate the development process. A total of fourteen CQs represented four major requirements, which included 
VHBIEO must 1) provide terms describing aspects regarding virtual human-building interaction experimentation, 
2) explicate its internal structure, 3) assist in the inclusion of unique information regarding particular experiments, 
and 4) promote machine-readable, accessible, and processable data files associated with virtual human-building 
interaction experimentation. VHBIEO used semantic web technologies to make it machine-readable, accessible, 
and processable. DOGMA methodology was applied to developing the internal structure of VHBIEO, which 
involves describing interconnectedness and commitment of terms, using Lexon and organizing groups of Lexons 
to support specific applications (Jarrar & Meersman, 2008). The commitment involves three groups, namely 
general, virtual, and in-situ commitments. The scheme of VHBIEO is illustrated in Fig. 1. APVs were established 
by adopting the concept of Model View Definition (MVD) implemented in Industry Foundation Classes (IFC) for 
allowing the inclusion of unique information for particular applications (Hietanen, 2006). The resources defined 
in VHBIEO are publicly available through a URL as https://w3id.org/vhbieo, and individual terms can be accessed 
using a unique URI as https://w3id.org/vhbieo#term. The ontology editor Protégé was used to support the 
development of VHBIEO (Musen, 2015). 
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Fig. 1: Scheme of VHBIEO (Chokwitthaya et al., 2023). 
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The evaluation of VHBIEO was performed, consisting of two parts: taxonomy evaluation and application 
evaluation. The taxonomy evaluation assessed the logical sequence, the completeness of the ontology, and the 
redundancy of terms. It ensured that VHBIEO provided a comprehensive and coherent representation of the 
domain of virtual human-building interaction experimentation. The application evaluation tested the ability of 
VHBIEO to describe and integrate information from real-world experiments. It also showed that VHBIEO 
promoted machine readability, accessibility, and processibility by providing several examples of querying 
information in the data files. VHBIEO was able to incorporate unique information (e.g., 7-point Likert scales) 
using APV. The evaluation revealed the efficacy of VHBIEO in providing standardized information and enabling 
machine-readability, accessibility, and processability, which is crucial for promoting consistency and accelerating 
the maturity of the virtual human-building interaction experimentation approach. 


3. CASE STUDY 


The case study aims to illustrate data described in VHBIEO-based metadata and prove that the metadata was 
machine-readable, accessible, and processable through query using a standardized query language (SPARQL). 
This objective is significant because it ensures the effective utilization and interoperability of the data marked in 
the metadata. When metadata is machine-readable, accessible, and processable, it can be leveraged to develop 
sophisticated analyses and applications. The accessibility and processability of the metadata mean that it can be 
dynamically engaged, enabling researchers and developers to manipulate and interpret the data efficiently, 
enhancing the comprehensibility and utilization of the information contained within the experimentation. 


The case study involves retrieving and analyzing information related to a VR-based experiment performed in an 
existing study (Rentala et al., 2021). The experiment primarily focused on evaluating the influence of outdoor 
temperature variations on participants’ thermal states within IVE. It was grounded in detailed and methodical 
experimentation aimed at dissecting the intricacies of thermal states in varying conditions, utilizing various events 
to simulate diverse environmental settings. The structure of the experimental data reflected the comprehensive 
nature of the research and the diversity of the variables considered. The variables range from environmental 
conditions (e.g., indoor and outdoor temperatures and humidity) to participants' physiological (e.g., skin 
temperatures at various body locations and heart rate) and perceptual responses (e.g., thermal perceptions at 
different body locations), providing a multidimensional perspective on the impact of outdoor temperature 
variations on thermal states in IVE experiments. These variables provide a robust and comprehensive dataset that 
allows for a thorough exploration and understanding of the thermal states of participants under different 
environmental conditions within IVE. The comprehensive dataset allowed the creation of the VHBIEO-based 
metadata, exploring all commitments and the majority of Lexons and rules defined in VHBIEO. In addition, the 
dataset enabled the exploration of APV since the experiment included the use of 7-point Likert scales, which 
needed customization in VHBIEO. 


The metadata was formatted in the Resource Description Framework (RDF) format, which is a standard format 
for representing ontologies and linked data. It was deployed on the Dataverse, a data repository platform, for 
testing. It was uploaded as a tabular data file that provided context and description of the data. It included 
information such as the experiment title, authors, funding sources, and descriptions of the data structure, variables, 
and observations. 


3.1 VHBIEO-based metadata 


The structure of VHBIEO-based metadata revolved around four core components of VHBIEO: experimental 
protocol, variable, plan, and data collection. This section discusses such components and their associated data 
pieces. The discussion aims to elucidate these components by exemplifying the key data elements pertinent to the 
experiment, thereby underlining the efficacy of VHBIEO in developing the metadata. 


3.1.1 Experimental protocol 


The experimental protocol, referred to as the backbone of the experiment, laid down the research's overarching 
strategy. This information was essential for understanding the core of the experiment. Fig. 2 delves into data 
intricately associated with the experimental protocol, illustrating several instrumental components, namely the 
protocol itself, experimental setting, hardware, and building, each playing a role in steering the experiment toward 
its intended objectives. 


Within the detailed protocol (expo:ExperimentalProtocol), a holistic introduction and thorough description 
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covering experimental statements, background, hypothesis, and methods sections - including inclusion and 
exclusion criteria - are encapsulated. Notably, the procedural roadmap for conducting the experiment, study 
timelines, visits, potential risks, and confidentiality assurances are also placed within this component. 


The settings utilized during the experiment are distinctly described using vhbieo: VirtualSetting and vhbieo:In- 
situSetting, denoting the use of a fully immersive virtual environment and a climate chamber setting, respectively. 


The hardware associated with these settings, integral to the execution of the experiment, is detailed through 
vhbieo:VirtualReality and vhbieo:In-situHardware. The critical element of a building, significantly tied to the 
protocol, is narrated via vhbieo:BuildingID. The building elements, representing architectural or design specifics, 
were meticulously described using the IffOWL ontology. 
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Fig. 2: Example of data associated with protocol. 
3.1.2 Variable, plan, and data collection 


While the concepts of variable, plan, and data collection are distinct, it is vital to recognize their interconnected 
nature. Variables define what data is collected, the plan dictates how to navigate and observe through the variables, 
and data collection strategies depict the methodology for gathering insights on these variables and aligning with 
the plan. Fig. 3, 4, and 5 exemplify this integrative relationship, offering a consolidated perspective through 
discrete examples and thereby facilitating a coherent understanding of the experiment's structure and methodology. 
Noteworthy is that the components spotlighted in each figure represent selective excerpts from the experiment, 
highlighting particular elements that significantly contribute to the focus of each example. To streamline and 
succinctly encapsulate the discussion, components not directly pertinent have been judiciously omitted. 
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Fig. 3: Example of data associated with skin temperature. 
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SECTION C - Al, DATA SCIENCE AND ANALYTICS 


Fig. 3 narrates variable, plan, and data collection components, specifically focusing on the human-skin temperature, 
particularly at the forehead. This temperature was identified as the target, or the dependent variable 
(expo:Target Variable), for the experiment. Its measurements were taken in degrees Fahrenheit, referencing the Unit 
ontology. The predictor or independent variable was set as the room temperature, with a controlled setting fixed at 
65 degrees Fahrenheit. This was described under vhbieo:PredictorVariable and vhbieo:PredictorCondition, 
respectively. The controlled contextual factor was temperature exposure (vhbieo:ControlledContextual Variable). 


The temperature exposure outlined the context (sted:Context) of heat exposure as a trajectory from a cold to a hot 
environment. Drawing upon this setting, the predictor condition and context conjoined to devise an event 
(sted:Event) introduced to a participant in the IVE. The human body was conceptually integrated as part of the 
building. Consequently, the responsive skin temperature was described in the dnas:BuildingSystem, linking it 
directly to the target variable of skin temperature. Invoking the concept of "state" from STED (as elaborated by 
Saeidi et al., 2018), a state embodies the dynamic status of the building system. It is susceptible to variations 
stemming from occupants! interactions and responses to the events they are subjected to. As a result, the skin 
temperature responses manifested in multiple states, including the temperature recordings before and after event 
exposure (sted:State). These recordings translated into the very act of skin temperature alterations encapsulated 
under dnas:Action. 


To hone in on data collection specifics, Fig. 3 illustrates using the Vernier surface temperature sensor under 
ssn:Sensor. This sensor was strategically positioned on the forehead described under vhbieo:SpecificLocation of 
the participant, uniquely identified by dnas:Occupant-ID. The resulting data was systematically chronicled in 
ssn:Observation. It showed a recorded forehead skin temperature of 91.5 degrees Fahrenheit, timestamped 
precisely to mark the experiment's execution on 23" April 2019. 
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Fig. 4: Example of data associated with thermal sensation. 


Fig. 4 provides a detailed examination of the variable, plan, and data collection components with a concentration 
on thermal sensation—another distinct target variable within the experiment. The structures of the variable and 
plan components mirrored those elaborated upon in Fig. 3. However, the method of data collection exhibited 
notable differences. 


In the context of thermal sensation, data was accumulated through a survey mechanism, described in sur:Question. 
Significantly, the application view (APV) was utilized to annotate and clarify the 7-point Likert scale, which 
constituted the unique response choices in the experiment. This scale was referenced as 
(vhbieo4Thermal:ThermalSensationAnswerChoices). The nature of this question aligned with a thermal 
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perception survey, a standard defined and endorsed by ASHRAE (American Society of Heating, Refrigerating and 
Air-Conditioning Engineers) and cataloged under (vhbieo:SurveyType). The culmination of this survey's execution 
was preserved under sur:CompleteQuestion, which depicted the participant's thermal sensation as being "slightly 
warm". It was anchored with a timestamp, noting the date of conducting the experiment. 
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Fig. 5: Example of data associated with age. 


Fig. 5 shifts its focus from target variables and delves into an uncontrolled contextual variable encountered in the 
experiment—specifically, the age (vAbieo:UncontrolledContextual Variable) of a participant. The unit used to 
measure age was "year", which draws reference from the Unit ontology. For data collection pertaining to age, a 
demographic survey (vibieo:SurveyType) was employed. Participants were prompted with a straightforward 
question, "What is your age?", which was recorded under sur:Question. Responding to this question required 
participants to fill in a blank space, indicating their age, and this form of response was described in sur: Answer. 
In a manner akin to the thermal sensation data collection, the resultant age data was encapsulated under 
sur:CompleteQuestion. This documented an instance of a participant's age being 24 years old. 


3.2 Querying VHBIEO-Based Metadata 


The executed data query was instrumental in validating the machine-readability, accessibility, and processability 
of the metadata. To resonate with the provided examples, the query was formulated to extract specific data points 
corresponding to variables such as the skin temperature at the forehead, thermal sensation, and age. In conjunction, 
the computation of averages for each variable within the query further attested to the metadata's machine- 
processable nature. This ability to programmatically access, read, and compute values from the metadata 
substantiates its utility and robustness in supporting research and data-driven applications. 


Fig. 6 presents the specific queries crafted to extract data pertaining to the skin temperature (Fig. 6a), thermal 
sensation (Fig. 6b), and age (Fig. 6c) of all participants. These queries are structured to target the data points within 
the dataset precisely. It also conveys the outcome of the queries. Notably, the result encapsulated the computed 
averages of the obtained data, underlining the machine-processable nature of the metadata. 


For Fig. 6a, the query primarily focuses on obtaining data related to the forehead skin temperature of participants. 
Such data held significant relevance, as skin temperature could provide insights into a participant's 
thermoregulation response in a given environment. By furnishing both individual temperature data points and an 
overall average, the query aided researchers in discerning patterns and anomalies within the data. 


The emphasis in Fig. 6b shifts from physiological responses to perceptual experiences. Querying the thermal 
sensation looked into how participants subjectively felt about the thermal environment they were placed in. The 
provision of an average sensation value offered a summarised perspective. Such data could be vital in studies 
aiming to align objective environmental parameters with subjective human comfort levels. 


The query in Fig. 6c acknowledges the demographic diversity of the participants. Age is a potential confounding 
variable in many human studies, as age might influence a person's physiological or perceptual response to 
environmental factors. Researchers could factor in age-related nuances in their analysis by procuring both 
individual age data and its average. 


In essence, these queries demonstrate the versatility and depth of VHBIEO-based metadata. Furthermore, the 
structure of each query, particularly the incorporation of averages, emphasizes granularity and summarization, 
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ensuring that the metadata is machine-readable, accessible, and processible. 


PREFIX expo: chttp://wew. bozo. jp/owl /EXPOApr19/> 

PREFIX rdf; <htt] www. w3.0rg/1999/02/22-rdé-systax-nst> 

PREFIX rdfs: <ht jwaww) Or3/2000/01/rdt-schenat> 

PREFIX sosa: <htip://www. w3. org/na/sosa/> 

PREFIX vibieo: <hħttps://v3id.org/vhbieoë> 

SELECT ?occupant(D property ?tenpersture ?location ?Average_forehead_tempersture 
WHERE { 
t 


SELECT (AVC(?Result} AS 2Average forehead tenperatura} {"" AS ?occupantID) 
{"" AS property} ("" AS ?temperature) ("7 AS Zlccation) (*" AS 7Result) 
WHERE { 

?individual a sosa:Observation . 

?individual ?property ?value_ . 

FILTER (?value_ = vhbieo:HBLO00002) 

Pindlvidual sosa:hasSimpleResult ?Result . 


OccupantiD Property 


UNION 


PeccupantID a sosa:Observation . 
oceupantID ?property_ ?value_. 
FILTER (?value_ = vhbleo:8L000002) 


OPTIONAL (?value_ rdfs:comment ?location } 
PeccupantID ?property ?tenperature . 
FILTER (tproperty = sosashasSinpleResult) 
BINDI"" AS ?Average_forehead_temperature) 
H 
ORDER BY ASC(?Average_forehesd temperature) 


PREFIX expo: <http 
PREFIX 


i //uuu hozo. jp/owl /EXPOApri9/> 

} Awu. w3 .0xr3/1999/02/22-rdf-systax-ns#> 

PREFIX g (www .w3.0rg/2000/01/cdf-schemet> 

PREFIX sosa; <httpi//www.wt.org/ns/sosa/> 

PREFIX vhbieo: <httpa://w3id.org/vholeot> 

SELECT ?cccupantID ?property ?temperature Plocation ?Average forehead sensation 
WHERE { 


SELECT (AVG(?Result) AS ?Average_forehead_sensation) ("" AS Poccupantid) 
("" AS 2property) ("" AS 2aeneation) ("" AS Plecation) ("" AS Result] 
WHERE į 

individual a sur:CompleteGuestion . 

Tindividual 7property_ 7value_ . 

FILTER (?value_ = vhbieo: TAV00D006) 

individuel sosa:hesSimpleResult Result . 


7occupantID a sur:CompleteQuestion . 
JoccupantID ?property_ ?valve_ . 
FILTER (?value_ = vhbieo:TAV000006) 


OPTIONAL {?value_ rdfs:comment ?location | 
JoccupantID ?property ?sensation . 
FILTER (?property =- sosa:hasSimpleResult) 
BIND{"* AS ?Average_forehead_sensation)} 

H 

ORDER BY ASC(7Average_forehead_sensation) 


PREFIX expo; <http://www. hozo, ip/owl/EXPOApr19/> 
PREFIX rdf: <http://www.w3.0rg/1999/02/22-rdf-systax-nsa> 
PREFIX rdfs: <http://wwv.w3.org/2000/01/rdf-schenat> 
PREFIX sosa: <http://www.w3.org/ns/sosa/> 
PREFIX vhbieo: <https://w3id.org/vhbieot> 
SELECT 7occupentiO property ?age 7Average_age 

WHERE | 

( 


SELECT (AVG(?Result) AS Average _age) ("7 AS PoccupantID) 
(°" AS 2propesty) ("" AS tage) {°" AS ?Result} 
WHERE { 

individual a sur:CompleteQuestion - 

?individual ?property_ ?value_ . 

FILTER (?value_ = vhbieo:DCV000003) 

2individual sosa:hasSimpleResult ?Result . 
) 

t 

UNION 

{ 


9122 

OCCWOL hasSimpicResult 91.57 forehead 

OCC? hasSimpleResult 93.00 forehead 

OCCH003 hesSimpleResult 9438 forehead 

occas basSimpleResult 92.80 forehead 

OCCHOOS hasSimpleResult 95.84 forehead 

OCCWO6 hasSimpleResult 97.11 forehead 

(a) Skin temperature at forehead. 
Thermal Average forehead 
Occupant) Property sensation Boat thermal sensation 
062 

OCOD! hasSimplicResult -2 forehead 

OCCO? —__hasSimpleResult 1 forehead 

OCO0003 hasSimpleResult 0 forehead 

ocon hasSimpleResult 0 forehead 

OCOO00S hasSimpteResult -I forehead 

OCCOOO6 hasSimpleResult -I forehead 

(b) Thermal sensation at forehead. 

oe 
OCCUOOL hasSimpleResult 24 forehead ` 
OCC0002 hasSimpleResuh 26 forchead 

OCC0003 hasSimpleResult 21 forehead 

OCCHO04 hasSimpleResulit 34 forehead 

OCCI005 hasSimpleResult 42 forehead 
„Occu hasSimpleResult__ 32 forchead 


FoccupantID a sur:CompleteQuestion . 
JoceupantID Pproperty ?value_. 
FILTER (?value_ = vhbieo:UCVd00003) 
?occupantiD ?property ?age . 
FILTER (?property = sosa:hasSimpleResult) 
BINDI"* AS TAverage_age) 

H 

ORDER BY ASC (?Average_age) 


(c) Age of participants. 


Fig. 6: Queries and their results. 
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4. DISCUSSION 


The primary objective of this work was to demonstrate the efficacy of the Virtual Human-Building Interaction 
Experimentation Ontology (VHBIEO) in supporting the development of VHBIEO-based metadata. This metadata 
aims to create a structured representation of experimental data, promoting machine-readability, accessibility, and 
processability. Reflecting upon the objective, development of VHBIEO-based metadata, and query results, this 
section broke down the accomplishments, implications, and broader impact on the research horizon. 


Systematic Representation: At its core, VHBIEO is an ontology that aims to systematically represent knowledge 
in the domain of human-building interaction experiments. The VHBIEO-based metadata took advantage of this 
structured knowledge representation, ensuring that every aspect of an experiment was adequately documented, 
from the broader experimental protocol to the minutiae of data collection methodologies. 


Data Accessibility and Processability: The design of VHBIEO ensures that data is machine-readable, accessible, 
and processible. This was evident in the queries we discussed (e.g., Fig. 6), which retrieved data efficiently and 
could also compute statistics such as averages, showcasing the power and flexibility of this ontology-based 
metadata. 


Facilitating Advanced Analyses: The granularity offered by the VHBIEO-based metadata, segmenting data into 
categories like skin temperature, thermal sensation, and age, paved the way for more complex and nuanced 
analyses. Discerning patterns or influences among these variables became relatively straightforward when they 
were clearly organized. 


Overall, VHBIEO provides standardization for virtual human-building interaction experimentation as a robust 
mechanism for associated experimental data. Such standardization diminishes the scope of ambiguities, ensuring 
the data remains consistent across different stages and platforms of its utilization. Furthermore, data exchange 
across diverse platforms and researchers is inevitable. VHBIEO potentially enables researchers to transfer, match, 
and integrate data from varied sources without the nuances of interpretation. This is akin to creating a seamless 
bridge where data flows without errors. 


5. LIMITATION 


Despite the evident advantages and the transformative potential of VHBIEO, certain limitations need to be 
considered. 


Continuous Maintenance and Refinement: A pressing challenge associated with ontologies, VHBIEO being no 
exception, is their continuous upkeep. Changes in domain knowledge or enhancements in ontology capabilities 
require periodic revisions. This process demands not just a technological revamp but also the infusion of domain- 
specific expertise to ensure the ontology stays relevant and accurate. 


Potential Overlaps with Existing Ontologies: It is conceivable that some terms introduced in VHBIEO might 
overlap with those in existing ontologies or semantic models. The authors, in their initial sweep, might not have 
identified these overlaps. Should such duplications be spotted in the future, VHBIEO will be updated to reflect a 
more harmonized structure. 


Scalability with Evolving VR Technology: Virtual Reality (VR) technologies are in a state of flux, constantly 
evolving. As VR matures, simulating intricate simulations over prolonged durations might become feasible. In 
such scenarios, the foundational structure of VHBIEO, anchored on STED, might demand reevaluation and fine- 
tuning. 


Potential Limitations in Collaborative Experiments: As research becomes increasingly collaborative, the 
experiments often span multiple geographies and phases. This dynamic nature of collaboration, marked by 
continuous data exchanges and iterative updates, could pose challenges for VHBIEO in its current form. Future 
iterations of VHBIEO will need to address this aspect to stay relevant in an interconnected research ecosystem. 


Lack of Advanced Features in APV: Although the current APV supports unique terminology descriptions tailored 
for specific experiments, it falls short in several other features. Features such as internal structure customization, 
automation capabilities, and bridging with other ontologies are crucial for enriching VHBIEO's utility. The 
integration of these features could significantly expand its scope and application. 


798 


6. CONCLUSION 


The study showcased the advantages of using VHBIEO in human-building interaction research. By providing a 
structured approach to document experimental protocol, design, settings, variables, and data collection, VHBIEO- 
based metadata paves the way for better experiment reusability, comparability, and reproducibility. This is 
especially crucial for experiments relying on [VE-based experimental information and results. Yet, it is crucial to 
acknowledge the limitations inherent in this preliminary implementation. Specifically, the research's scope was 
constrained mainly due to the limited number of validation cases. As the field progresses, it is essential to 
continually refine VHBIEO by addressing these limitations and validating its utility across a broader array of 
experimental contexts. 
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ABSTRACT: Cost estimation for tendering is one of the leading causes of legal disputes in the architecture, 
engineering, construction, and facilities management (AEC/FM) industry. 


The lack of a standardised support procedure to verify the association of cost data with the objects model causes 
waste of time and inaccuracy in the cost estimation. 


This research work, starting from a previous study where the research group integrated a cost domain in the IFC 
data schema, investigated the possible applications of this IFC based cost domain integrated with an IFC 
geometrical information model. The current paper investigates a specific case study focused on a structural model 
to verify current and future applications. 


Furthermore, rules for BIM information requirements will be defined through the Information Delivery 
Specification (IDS) to ensure an easy way for humans and computers to understand it. This will allow to specify 
which data must be present in the geometric model to subsequently ensure validation and verification of uniqueness 
of the cost data associated with geometric data. 


The results show the possibility to define a structured cost items in IFC associated through relationships to other 
entities and then verify their association to geometric data to guarantee its consistency and uniqueness. 


KEYWORDS: IFC, cost ontology, BIM, cost item, Information Delivery Specification (IDS) 


1. INTRODUCTION 


Cost estimation is one of the most critical tasks and still unresolved problem in the architecture, engineering, 
construction, and facilities management (AEC/FM) industry. 


To be able to obtain a cost estimate for a building, it is generally necessary to classify all objects in the building 
project using articles and to record their quantities. Although this is an objective process, human errors can often 
be encountered relating to both the incorrect association of prices and the incorrect calculation of quantities. One 
of the problems facing the AEC industry today is precisely the lack of a standardised support procedure to verify 
the association of cost data (Adeli et al., 2001; Lu et al., 2016). With the advent of BIM, computing tools have 
changed and evolved digitally. Wu et al., (2014), Sacks et al., (2018), Elghaish et al., (2020), and Olatunji et al., 
(2021) reported on the possibilities of BIM to improve and support cost estimation, but the approach to computing 
has remained the same. So, the problem remains; in fact, while previously the costs were associated with 
measurements, they are now associated with model objects but there is no certainty that this association is correct 
and consistent. 


Currently the computation software receives the information from a model exported in an open format, Industry 
Foundation Classes (IFC) and retains the cost listing within it. With this study, the aim is to compensate for the 
lack of standardisation in cost validation by creating an IFC-based cost semantics, thus identifying a common 
language between model objects and costs. 


The IFC is standardized according to ISO 16739-1. This could provide a solid basis for the exchange of information 
resources between information systems (Froese T et al., 1999). The IFC standard published by Buildingsmart 
International plays a very important role in the process of exchanging BIM data between the various participants 
in a building construction or management project, as it is an open specification. IFC provides some entities to 
represent information in building management, including [fcConstructionManagementResource (building 
resource) JfcWorkPlan (planning), JfcTask (task), IfcScheduleuleTimeControl (task time information), 
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IfcCostSchedule (cost planning), [fcCostItem (unit cost estimation item) and [fcCostValue (value). 


This research work, starting from a previous study in which the research team integrated a cost domain into the 
IFC data model (Cassandro et al., 2023), investigates the possible development of a standardised support procedure 
to verify the uniqueness and correctness of the association between the new cost items and their written information 
within the IFC standard and the geometric objects contained in a specific case study, a structural model. 


Currently, in Italy, the cost items are contained in the list of public works (in the specific case of the Price List of 
the Lombardy Region) a document based on unstructured data and characterized by a natural language. 


Starting from a work previously developed by the research team (Cassandro et al., 2023), it was possible to initially 
create an IDS file for the definition of the requirements that must be present in the geometric model. This is 
fundamental to guarantee both the correct association of the entities of cost but also the analysis and the 
interrogation of the data in the successive phases of verification. Subsequently it is possible to verify the 
association between the cost items and the geometric objects to ensure the uniqueness and correctness of the cost- 
object relationships created. 


Specifically, this would allow to: 


e check the correct price association; 

e ensure validation, uniqueness and comparison between attributes of a certain cost class and the attributes 
of the object to which it is associated; 

e consider cost elements as standardizable and query-able computer classes. 


The example of a structural model has been taken and a structure of relations between costs and geometry has been 
created to allow the verification of the uniqueness of associated data and validate the code developed. 


The paper is structured as follows. First an analysis of the existing literature on cost estimation via IFC classes, 
current methods of checking compliance and the Information Delivery Specification standard (IDS) is presented. 
Currently there are not BIM authoring software that can write the /fcCostItem entity and as a result you cannot 
check this information; so, it was decided to rely on the IfeOpenShell library to initially create the cost items and 
then verify the association and accuracy of the data of the entity through the code developed for this research. 


2. THEORETICAL BACKGROUND 


Within the BIM process, one of the most relevant parts is undoubtedly the validation of the model and the 
information it contains for a proper exchange of data. There may currently be different types of project’s validation, 
or model checking, and they may be the following: 


e verification against geometry (clash detection); 
e verification against design settings (e.g. the specifications within the BEP, BIM validation); 
e checks against regulations (code checking) 


The purpose of these checks is to ensure that all data entered within the model is correct from a geometric and 
informative point of view, and that it meets all standards. 


Currently, this check is a process that, while being facilitated using software and tools, still remains time- 
consuming, expensive and prone to errors (Dimyadi & Amor, 2013). The main problem of model checking always 
concerns the validation process (Ghannad et al., 2019). 


2.1 Model Checking 


Model checking is a key element in information modeling and management (Ciribini et al., 2015). In standard 
design processes, according to studies, only 5-10% of the information content of the project is systematically 
checked (Trebbi et al., 2020). 


Clash detection means the verification of geometric interference; this could become a problem during the 
construction of the work which is not checked in advance within the 3D model (Akponeware & Adamu, 2017). 
These are divided into two types; the "hard clash" referred to two objects that collide and occupy the same physical 
space, and the "soft clash" referred to objects that do not collide but are too close. 
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Code checking is the verification of the compliance of the digital model with the corresponding regulation (Trebbi 
et al., 2020). The use of specific software that supports these controls can reduce time and error, thereby improving 
several aspects of building design, including efficiency and model quality (Greenwood et al., 2010). 


It is essential to verify the compliance of the models with regulatory and technical requirements, and therefore an 
automatic control of the frequency and uniqueness of the information would have a significant value within the 
AEC industry (Solihin & Eastman, 2015). Furthermore, it may be necessary to verify compliance with the 
requirements of “Employer Information Requirements” (E.I.R.) or of “BIM Execution Plan” (B.E.P.) (PAS 1192- 
2:2013). 


The first study on automated code compliance checking is the Singapore project CORENET (Construction and 
Real Estate Network) an initiative based on the complete integration of the life cycle phases. Similarly, in the USA 
SMARTCodes was born and Autodesk Revit provided some plug-ins such as UpCodesAI which supports some 
parts of the International Building Code but also some parts of other standards from other jurisdictions, in Australia 
DesignCheck. 


2.2 Existing Applications 


Model checking is normally done by use of standalone applications as Solibri Model Checker, SMARTcodes, 
ePlanCheck, AEC3 Compliance or EDM Model Server (Ismail et al., 2023). An often used example of model 
checking is clash detection to validate if for example different types of pipes intersect each other. Another example 
can be to check if the width of the doors is according to codes of accessibility in the regulations or national 
standards. The most used for model data verification are Solibri Model Checker and Naviswork. 


Solibri Model Checker (SMC) is a prominent BIM software application which assist designers in visualizing any 
issues or problems regarding the design model before and during construction. It is one of the few software 
packages that leaves the end user with a minimum of scope for action. The rules set in SMC are set for the 
Norwegian State Administrative Agency handbook but can be modified by the end user by changing the rule set 
or deleting some. The creation of new rules is possible but has limitations. To be able to create new rules, it is 
necessary to act on the API, which is not public. 


Naviswork, is one of the most widely used tools on clash detection and coordination of models from different 
disciplines. The software detects intersections or conflicts between elements in the 3D model, helping to identify 
and resolve construction or design issues promptly, reducing errors and costs during project execution. 


3. PROBLEM STATEMENT 


As can be seen from the literature analysis above, typically the verification is done within the geometric model 
considering only one domain (that of the model itself). Instead, the goal of the research is to validate data between 
a plurality of domains linked together (in this case the geometric domain and the cost domain) and that can be 
contained in the same model. The architectures of cost items cannot be considered exclusively as strings of text in 
natural language because they are not machines readable. For this reason, to verify and validate the consistency 
and uniqueness of the data, it is necessary to structure according to a semantic defined cost items in more complex 
architectures thus creating a cost domain. This should ensure that the consistency of associated data between cost 
and geometry can be verified. 


Starting from this statement, the research focuses on the key aspect of: 


- How to define a procedure for checking and verifying data between geometric and cost domains. 


4. RESEARCH AIM & METHODOLOGY 


This research investigates the development of a standardized support procedure to verify the correspondence 
between the cost items and the data they contain with the objects contained in the information models. 


The cost data in this research are stored in a new cost database based on architectures developed in openBIM 
format and structured according to the IFC data model (Cassandro et al., 2023); currently, in Italy and in the 
specific case of the list of public works of the Lombardy Region, cost items are present in unstructured format 
within textual documents in natural language. This causes problems and possible errors both in the association and 
in the verification of the associated costs. In fact, currently one of the most challenging issues for building design 
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compliance checking is the translation of human-readable rules/documents into a computer processable code to 
allow the understanding also by computer tools. 


For this work a specific case study focused on a structural model containing cost items, structured and in IFC 
format, already associated with their geometric objects, is analazyed to verify the current and future applications 
of this methodology. 


Furthermore, rules for BIM information requirements are defined through the Information Delivery Specification 
(IDS) to ensure an easy way for humans and computers to understand it. This allows you to define which data 
must be present in the geometric model to ensure first a correct cost association and later validation and verification 
of the uniqueness of the cost data associated with geometric data. 


First, it is described the state of the art of current practices and research related to compliance checking. The idea 
is to standardize cost element data as a structured class in the IFC data model. This in fact contains a set of attributes 
that allow cost data to be stored, as is already the case for model objects. 


A simple unit cost database has been created (the new digitised price list) based on IFC files relating to cost items 
of structural works (concrete casting, reinforcement laying, formwork laying, etc.). These files can be called in 
any geometric model for the definition of the cost to associate to the objects of the model. 


The next step is characterized by the definition of the requirements that must be present in the geometric model 
for a better and correct association between IFC entities, [fcCostltem and IfcElement. 


It is defined the Information Delivery Specification (IDS) that will be delivered to the modeler and after that a 
code is developed that will allow to verify if the association of the cost is coherent or not with the object identified. 


The entity will allow to translate the current cost items, in natural language, in a defined and standardized data 
structure (definition of the framework and the semantics of the costs through the entity [fcCostItem). 


The methodology adopted is characterized by the steps presented in Figure 1. 
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Fig. 1: Research Methodology 


5. KEY CONCEPT 


The IFC, an open and interoperable standard, aims to "allow interoperability between industrial processes of all 
different professional sectors in civil engineering projects by allowing IT applications used by all project 
participants to share and exchange information about the project" (BuildingSMART, 2023). It is an open 
international standard, standardized according to ISO 16739-1:2018; it is designed to be a vendor-independent 
data model and usable in a wide range of hardware devices, software platforms and interfaces for many different 
use cases. 


5.1 IfcCostItem 


IfcCostltem is a non-geometric entity, subclass of [fcControl, within IFC. IfcCostItem describes a cost or financial 
value with descriptive information that describes its context (BuildingSMART, 2022). It represents the cost of 
assets and services, the execution of works by a process, lifecycle cost, cost estimates, budgets, and more. 


IfcCostltem is also described through a set of attributes. Some of them are inherited instead the attributes 
PredefinedType, CostQuantites and CostValue are those owners of the class. An /fcCostItem can link one or many 
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IfcCostValue's representing a unit cost, total cost, or a unit cost with one or many quantities used to generate the 
total cost. The quantities can be given as individual quantities, or those quantities are provided as element quantities 
by one or many building elements. Another key aspect is that ZfcCostItem can activate some different relationship. 
Among these it can be nested to create cost assemblies through the relation [fcRe/Nests, it can be assigned to a 
IfcProduct through the relation [fcRelAssignsToControl but may also have a product associated through the relation 
IfcRelAssignsToProduct or a resource through the relation [/cRelAssignsToResource. 


5.2 Information Delivery Specification 


The Information Delivery Specification (IDS) is a standard defined by BuildingSMART to define the required 
level of information in the specific project (BuildingSMART, 2023). IDS defines the information requirements 
that a geometric model must contain for the correct exchange of data in a way that is easily readable by humans 
and interpretable by the machine. It defines how to deliver and exchange objects, property, even values and units 
of measurement. An IDS file may contain several requirements independent of each other and without reference 
to other requirements in the file. This allows you to create replicable blocks and use them in different files. 


Currently the information requirements are shared through excel sheets or PDFs; these are not directly interpretable 
by a machine and difficult to read by people given the large amount of data. 


The IDS focuses on ‘information delivery specifications’ defining what information is needed and how it should 
be structured. This should improve automated workflows by receiving information that can be processed 
automatically. The definition of an IDS also serves to standardize all the different approaches that there may be in 
modelling. An example of different approaches may be the use of slabs instead of landings. 


6. EXPERIMENTATION & RESULTS 


In this article, part of a larger research work carried out by this research group, the uniqueness and correctness of 
the association of cost items with geometric objects has been analysed and validated. This has been achieved by 
defining a standardized data structure translated within the IFC standard, as cost data is in natural language 
(unstructured data). This allows you to define a database of cost data within the IFC standard through the classes 
currently present (/fcCostItem, IfcCostValue, etc.) and the relationships that these can activate. 


The current paper investigates a specific case study focused on a structural model. 


Not being the objective of the article and having been analysed in (Cassandro et al., 2023), it will not be explained 
as it has been defined the structure and the relations of the single item of cost inside the standard IFC and as this 
is related to the geometric entity. Figure 2 shows a simplified example of the possible architecture behind the cost 
item related to the concrete casting for a foundation; this item like all the others will be stored in the new database 
of cost items. 
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Fig. 2: Simplified example of architecture behind the cost item related to the concrete casting for a foundation. 
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All data used and related to cost items come from the price list of the Lombardy Region. In Italy the estimation of 
the prices in public tendering takes place using a price list. Each region has a catalogue containing price items, 
called price list, which is the basis of the economic offer and regulates payments in public contracts. 


6.1 Model Requirements 


Starting from a detailed analysis of the cost items, the minimum requirements (Level Of Information Need) that 
the geometric model must contain to ensure the subsequent association, verification and validation of data between 
geometric objects and cost items have been identified. It was possible to define the basic requirements to be 
delivered to the modeler on the basis of a work of analysis and breakdown of the current cost items for the 
identification of a new standardized architecture in which to insert and structure the current cost items. 


The information that a single geometric entity must have in the model (called Facet in the standard IDS) have been 
defined. In the first part of the facet (applicability section), it was defined to which type of objects the specification 
applies and then it was defined the requirements (requirements section) that is required for the objects specified in 
the first part, such as required properties or classifications. Each specification has metadata (name, description, or 
instructions) to help describe the goals and instructions of how to achieve it before the applicability section (Figure 
3, Figure 4). In the following example we ask as fundamental requisites that all /fcS/ab entities of the geometric 
model (applicability) have compiled both the attribute "PredefinedType" according to the specific values defined 
by the standard (only these values are accepted: BASESLAB, FLOOR, LANDING, ROOF, NOTDEFINED, 
USERDEFINED) and the attribute "Name" with unspecified value (requirements). 


THE ENTITY IFCSLAB MUST HAVE NAME AND PREDEFINEDTYPE 


ALL SLAB DATA 


SHAL BE SLAB DATA WITH A TYPE OF EITHER BASESLAB or FLOOR OR LANDING or ROOF or NOTDEFINED or USERDEFINED 
THE NAME SHALL BE PROVIDED 


Fig. 3: Simplified IDS user visualization with "IfcTester" web application. 


<ids:specification ifcVersion="IFC4" name="The entity IfcSlab must have Name and PredefinedType" minOccurs="0" maxOccurs="unbounded"> 
<ids:applicability> 
<ids:entity> 
<ids:name> 
<ids:simpleValue>IFCSLAB</ids:simpleValue> 
</ids:name> 
</ids:entity> 
</ids:applicability> 
<ids:requirements> 
<ids:entity> 
<ids:name> 
<ids:simpleValue>IFCSLAB</ids:simpleValue> 
</ids:name> 
<ids :predefinedType> 
<xs:restriction base="xs:string"> 
<xs:enumeration value="BASESLAB" /> 
<xs:enumeration value="FLOOR" /> 
<xs:enumeration value="LANDING" /> 
<xs:enumeration value="ROOF" /> 
<xs:enumeration value="NOTDEFINED" /> 
<xs:enumeration value="USERDEFINED" /> 
</xs:restriction> 
</ids:predefinedType> 
</ids:entity> 
<ids:attribute minOccurs="1" maxOccurs="1"> 
<ids:name> 
<ids:simpleValue>name</ids:simpleValue> 
</ids:name> 
</ids:attribute> 
</ids:requirements> 
</ids:specification> 


Fig. 4: IDS document in machine-readable xml format. 


The rules for the BIM information requirements that have been defined have been collected in Table 1. Two 
examples of requirements for modeling structural and non-structural foundations are given. We can see how the 
"Req.1" defines the requirements that each individual object of the model exported as /fcSlab.BASESLAB with 
Loadbearing value false (applicability) must have. 


The ACCA software "usBIM.IDS" was used to verify the requirements and the correctness of the geometric model. 
It was therefore possible to verify the information contained in the geometric model and to detect discrepancies 
from the requirements initially defined and necessary. 
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Table 1: Examples of requirements for modeling structural and non-structural foundations 


Requiremt 1 


Applicability Requirements 
IfcSlab.,BASESLAB Attribute 


Loadbearing FALSE 


Pset_ConcreteElementGeneral 


Name 
ConstructionMethod 
StrengthClass 
ExposureClass 


StructuralClass 


In Situ 


C16/20 


X0 


S4 


Pset_SlabCommon 


FireRating 
IsExternal 
LoadBearing 


Status 


Qto_SlabBaseQuantities 


Depth 

Width 
Length 
Perimeter 
GrossVolume 


NetVolume 


Requirement 2 


IfcSlab. BASESLAB Attribute 


Name 


Loadbearing TRUE 


Pset_ConcreteElementGeneral 


ConstructionMethod 
StrengthClass 
ExposureClass 
StructuralClass 
ReinforcementVolumeRatio 


ReinforcementStrengthClass 


In Situ 


C25/30 


XCl 


S4 


Pset_SlabCommon 


FireRating 
IsExternal 
LoadBearing 


Status 


Qto_ SlabBaseQuantities 


Depth 

Width 

Length 
Perimeter 
Gross Volume 


NetVolume 


6.2 Verification of the uniqueness and completeness of data 


As already widely discussed in the section of “RESEARCH AIM & METHODOLOGY?” the article aims to identify 
a method of verification of the uniqueness and correctness of the association between cost item and geometric 
object. 
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Starting from structured cost data and a geometric model that meets the information requirements identified by 
IDS, a code has been developed for the association of IFC cost entities - [fcCostItem with IFC geometric entities - 
IfcElement (will not be discussed in this article), and after that another code has been developed for the verification 
and validation of the correctness and uniqueness of the process of association of cost items to geometric objects. 
This verification process is completely different from the current ones; in fact, currently the verifications are 
focused exclusively within the same domain. What the research does is verify the correctness and uniqueness of 
the information between two different domains (in the specific case geometric domain and cost domain). 


The developed code was implemented using Python 3.10, IffOpenShell, an open-source library (IfeOpenShell 
v0.7.0) and the IFC4 ADD2_TC1 - 4.0.2.1 (currently the official version). 


Through the developed code it is possible to perform an analysis of the geometric model containing the price items. 
The analysis involves a detailed verification of the information contained in the geometric object of the model 
(PropertySet, Material, etc.) against the information contained in the price item (Sample Element, PropertySet, 
Material of Sample Element, etc.) stored in the cost database. In fact, the architecture of the cost item is not present 
inside of the analyzed geometric model, which contains instead the single entity of cost UfcCostItem) useful for 
the definition of the estimate of the costs (cost schedule); this retrieves some key data from the unit cost item, such 
as name, description, or unit cost value /fcCostValue). This will allow to maintain a constant relationship between 
the item from cost estimation and the unit cost item without weighing down the model of information stored in an 
external queryable database (Figure 5). 
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Fig. 5: Relation between geometric object data and cost item data 


Due to the scope of the article the verification was performed on a limited number of geometric entities of the 
structural model: a sub foundation, a foundation, a slab and concrete masonry. Each of these entities is made of 
concrete cast in place and armed but having different characteristics (exposure class, strength class, structural class, 
etc.). 


The data analysis begins by first identifying the geometric entities to which a cost had been associated. After that 
through a user interface based on manual input it comes chosen which relation object-cost to analyze. Starting 
from this choice, the code allows to extract the cost item associated with the geometric entity and, through a 
predetermined key (/fcCostItem.Name + IfcCostItem.Description), search the corresponding unit cost item within 
the cost database. Once the cost item is identified, the code proceeds by analyzing the properties and extracting 
the data to be verified; for example, the sample element associated with the cost item and the relative 
Pset_SlabCommon are analyzed to understand if it is a structural element (Loadbearing = True) or non-structural 
(Loadbearing = False), Figure 6. 
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#97=1fcCostItem('1$8jCZH8bFE97CyJuzD_2b',$,'Concrete casting for foundation layer', ’...’,.USERDEFINED., (#96), (#95) ) 
- #45=IfcRelAssignsToProduct ('3f5uwtK8r0ivS67zJKTVQU',S,'Rel cost-element','Rel between cost item and sample element", (#12) ,$, #14) 
- #14=IfcSlab('2942YU3sLCEVWbpvY3ug5Y',$,'Sample element of concrete foundation slab', ’...’,$,$,$,$,.BASESLAB.) 
= IfcSlab 
- BASESLAB 


> #24=IfcRelDefinesByProperties ('lqS4DR7EH8YwOpPSKWTsjd',$,'Rel Pset','Rel between Pset-Sample Item', (#14) ,#23) 
> #23=1fcPropertySet ('3eHYyWf$j309iq4HBOQt0uP',$,'Pset_SlabCommon', $, (#15, #16, #17, #18, #19, #20, #21, #22) ) 
° #22=IfcPropertySingleValue('LoadBearing','Whether this component is carrying (YES) or not carrying (NO)',IfcBoolean(.T.),$) 


Fig. 6: Query the cost item and the data it contains 


The analysis continues by questioning the geometric entity and identifying the corresponding properties analyzed 
in the cost item; For example, we analyze the element class (/fcS/ab), the PredefinedType attribute (BASESLAB) 
and its Pset_SlabCommon to check whether the element is structural (Loadbearing = True) or non-structural 
(Loadbearing = False), Figure 7. 

#1881=IfcSlab ('1r$167n5fDrxEOIVJzq9my', #19, 'FND_PLA',$,'Platea:FND_PLA_30', #1868, #1880, '242873', .BASESLAB. ) 


- IfcSlab 
- BASESLAB 
> #1900=IfcRelDefinesByProperties ('2zqfiKOTxYZV_kPQ7PQE8u', #19,$,$, (#1881) , #1887) 
> #1887=l1fcPropertySet ('2Unrwa5Dqbjdie587QU4TO', #19, 'Pset_SlabCommon',$, (#242, #718, #719, #1883) ) 
e #242=IfcPropertySingleValue('LoadBearing',$, I£cBoolean ( a Ds gs). 


Fig. 7: Query the geometric object and the data it contains 


After the IFC model query phase, the results are analysed and validated. A comparison of the data identifies any 
inconsistencies (Figure 8). These are reported to the user who can then decide which choice to take: 


- keep the cost-object association unchanged while knowing that the cost item is not completely congruent 
with the geometric object. to identify a cost item; 

- modify the cost item originally associated through the query of the cost database and choosing between 
the proposals identified or if they are not present to create a new cost item to be added to the database. 


This last mode (creating a new entry to be added to the database) has not yet been implemented and will be part 
of the future developments of the research. 
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Fig. 8: Example of verification of the correctness of the association of the cost item to a geometric object 


A report on the analysis of the output data is also saved in parallel with the display of the results; this report is for 
each individual association. In Figure 8 is shown an example of test verification performed on the association of 
the cost item of concrete casting for structural foundations and its geometric object (foundation) with relative data 
feedback. As we can see from the report obtained at the end of the test (Figure 9), the verification of the association 
of the cost item to the analyzed object provides the assessments on the current data. Therefore, it will be possible 
to understand the consistency or not of the data that the geometric object contains with the data contained in the 
associated cost item as visible in Table 2. 
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Verification uniqueness of the associated cost 
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Fig. 9: Report on the analysis of the output data 


Table 2: Verified parameters between geometric object (/fcS/ab) and associated cost item 


Entity Attribute/PSets Parameter Name Geometric Object Cost Item Check 
IfcSlab Attribute PredefinedType BASESLAB BASESLAB 
Pset_ConcreteElementGeneral ConstructionMethod In Situ In Situ 
StrengthClass C20/25 C25/30 x 
ExposureClass XCl XCl 
StructuralClass S1 S4 x 
ReinforcementVolumeRatio 100 100 
ReinforcementStrengthClass - B450C ND 
Pset_SlabCommon FireRating - - ND 
IsExternal FALSE TRUE Xx 
LoadBearing TRUE TRUE 
Status - NEW ND 
AcousticRating - - ND 
PitchAngle - - ND 
ThermalTransmittance - - ND 
Compartmentation - - ND 


7. DISCUSSION 


This study is part of a larger research work that aims to digitize and standardize cost data and identify a new costs 
domain, that can ensure the verification of the correct association between cost items and geometric objects. 


Considering the high uncertainty and inaccuracy of the information during the estimation processes is of 
fundamental importance: 


- define the information requirements that the model must contain (Level Of Information Need) for a 
correct economic management; 

- define an automated control procedure for the same information requirements (IDS) to avoid errors and 
time wasting; 
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- check the correctness and uniqueness of the cost items associated with geometric objects to ensure greater 
correctness of the cost estimate. 


Nowadays cost estimation is one of the most critical tasks in the AEC/FM industry. Therefore, to support, verify 
and improve the quality of cost estimates, in public tendering, and reduce human error-prone, the study proposes 
the identification and applicability of a procedure for the verification of uniqueness of cost data assigned to 
geometric object within IFC data model. This scientific research has led technological attempts through the writing 
of a code in Python and through the support of the library IffOpenShell. The results obtained are real, effective 
and scalable. The scalability of the hypothesized method has been demonstrated as it can also be implemented for 
other models. Currently, however, you can get these results only through code because current commercial 
applications do not allow user friendly implementations. The possibility of developing an executable to facilitate 
the verification of the model by an external user is being studied. 


Currently, as seen in the literature, the approaches used do not provide for a verification of the correctness and 
uniqueness of the association between two different domains, cost and geometry. Typically, the verification is done 
within the geometric model considering only one domain (that of the model itself). In fact, usually only geometric 
interference and checks with the current regulations are carried out. While the goal of the research is to validate 
data between a plurality of domains (in this case the geometric domain and the cost domain) linked together and 
that can be contained in the same model. This causes numerous problems both in the phases of cost estimation and 
in the phases of construction of the work with consequent cost increases and possible disputes between 
customer/commissioning body and enterprise. 


Nowadays, the only possible checks on the association of cost with a geometric object are made manually; there 
is no possibility for machines to understand information not structured and in natural language. For this reason, 
the goal of the research is to create a cost architecture to be associated with a geometric object, richer and more 
granular than a simple attribute associated in the model, allowing the verification of uniqueness and correctness 
of the data. The results of the research confirm the feasibility of the proposed method. 


8. CONCLUSION 


This research work is part of a larger project that will involve the relationship of the information of economic 
objects to the information of geometric objects. Specifically, the research shows how, in the AEC sector, it is 
essential to perform a verification of the associated information during the cost estimation phase for a correct 
management of cost data within construction projects. The research studies and experiments the application of a 
semi-automated method of verification of the cost data to ensure uniqueness and consistency of the information. 
This will allow to quickly and effectively verify if the cost information present in the project is consistent with 
what is stated within the geometric model. 


Despite the many advantages that this application can provide, some limitations have been found in the proposed 
method including: 


- standardisation of information and identification of requirements that the model must have (if the model 
does not follow the specified requirements, it is not possible to carry out cost verification); it is not 
possible to test the method on any IFC model received; 

- need for detailed analysis of the information in the IFC data model for a clear understanding of the 
geometric object; it is not possible to identify the geometric objects from the class attributes alone (Name, 
description, TypeEnum, ecc.) but it is necessary to deepen the relationships that the same creates with 
other entities UfcMaterial, IfcPropertySet, ecc.). A practical example is found among the objects lean 
concrete and slab foundation; they are both /fcSlab.BASESLAB and one of the ways to differentiate them 
is to analyze the LoadBearing single value (True or False) in the Pset_SlabCommon. 


Although the method has been applied to a specific case study in the structural field, the methodology can be 
applied for several case studies. Future developments should include testing it in different areas and models for 
construction and large-scale application to verify its reliability. In addition, an application with user-friendly 
interface must be developed to ensure easier use of the tool. 
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ABSTRACT: In the architecture, engineering, construction, and facilities management (AEC/FM) industry 
methodologies are needed to ensure the interoperability of data and effective management of information from 
different sources. Integration of the cost domain and cost estimation within the Building Information Model (BIM) 
in the AEC/FM sector is still an unresolved problem and one of the most critical tasks due to the lack of a 
standardised cost domain, especially in the tendering phase. 


To ensure interoperability between cost data and geometric data, this research aims to address this gap by 
analyzing methods of converting cost data into Linked Building Data, thereby defining a cost domain in the 
Semantic Web, by collecting them into a graph database. This allows for structuring a cost domain, translating an 
IFC based structure previously developed by the research group, visualizing it using a graph system, and 
connecting it to the BIM geometric domain. Furthermore, it is possible to extend the cost ontology previously 
identified in the IFC model and facilitate the queries and analysis of cost data currently fragmented and based on 
unstructured data. 


The results show how Semantic Web technology can be used to improve data interoperability, develop a cost 
ontology, and join both cost data and BIM models. 


KEYWORDS: Semantic Web, Linked Building Data, IfCOWL, cost ontology, IFC, RDF, graph system. 


1. INTRODUCTION 


The construction process is complex, dynamic and requires numerous interactions between the different actors 
involved. The sharing of different information, including information on the amount of physical components of 
the building, the planning plan, and the consumption of resources and costs is essential for accurate time and cost 
management. 


Nowadays, in the Architecture, Engineering, and Construction (AEC) industry, the standard format used for 
information exchange is the Industry Foundation Classes (IFC), a neutral and open ISO standard by 
buildingSMART. Currently, the most recent IFC scheme is IFC4 ADD2 and contains about 1200 classes. BIM 
software developers can implement an exporter to convert respectively their native BIM format to the neutral IFC 
format. 


Despite the clear advantages of data interoperability, which would otherwise only be feasible in proprietary BIM 
software, the IFC data model still has some limitations (Pauwels & Terkaj, 2016). These include, for example, the 
impossibility of extending it in a user-friendly way, the difficulty of developing applications with this template, 
due to the complexity of the schema expressed in EXPRESS format (Krijnen & Beetz, 2018), or the impossibility 
of relating data between the IFC file and other cloud historicized files. The IFC scheme is used as an interoperable 
format for sharing information, with the aim of being a supplier-independent exchange format, but not a fully 
integrated and comprehensive description of a project. This leads to an immense untapped potential of data in the 
AEC industry (Krijnen & Beetz, 2018). In comparison, the Semantic Web (SW) uses the Resource Description 
Framework (RDF) triple schema to store data and ontologies to enhance the semantic structure to make information 
machine-readable (Berners-Lee et al., 2001). It may also contain multiple ontologies, a formal and explicit 
specification of a shared conceptualisation (Gruber, 1993), covering specific areas not included in the IFC scheme 
(Rasmussen et al., 2017) and enabling the visualization and improvement of the interoperability of the IFC 
information model. 


In order to overcome these limitations, this research is focused on the potential of SW. This research focuses on 
the conversion of the architecture of cost items, assumed and implemented in the IFC data model in a previous 
study carried out by the research group (Cassandro et al., 2023), to RDF using the emerging Linked Building Data 
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(LBD) modular ontologies as proposed by the W3C LBD CG (Bonduel et al., 2018). To validate what has been 
done in the previous study, the IFC-to-LBD converter presented by Bonduel et al. (2018) has been used. 
Information from IFC building models is extracted and transformed into Abox RDF graphs suited for usage in 
Linked Data applications. 


The graph system will contain the relevant information of both the geometric model and the cost architecture with 
the related properties. Data translation within the SW will allow to query the model, associate the two different 
domains studied (geometric and cost) within a single environment, and view cost data and related architectures. 


This paper develops following these steps. First, a detailed analysis of the literature to deepen the themes on the 
SW. Secondly, the identification of the conversion tools available to date for the transition from IFC to RDF and 
the subsequent validation and analysis of the results within a graph database. Finally, the conclusions and future 
works will be set out. 


2. BACKGROUND 
2.1 Literature review 


The first research on the application of the SW in the AEC Industries dates back to the early 2000s; since then, 
their use has spread to more and more areas of the industry, producing interesting results in terms of number of 
publications and significance of results. Beetz et al. (2015) and Pauwels et al. (2017) analysed the reasons for the 
spread of SW in the AEC industry, identifying three main reasons: (1) Interoperability, (2) Liking across domains 
and (3) Logical inference and proofs. 


The possibility of improving the Interoperability (1) relies on the SW structure which provides a way to store 
information in a computer-understandable manner, making possible the comprehension of the information 
involved in the process both by a human being and a machine. 


The linking across domain (2) relies on the opportunity to create a unique web of linked data with information 
from all the different areas of the AEC process, (e.g. GIS, costs, energy, facility management, and so on). 


The third motivation is the logical inference and proofs (3), which relies on the OWL language used for the 
semantic meaning. Correct use of the language offers the opportunity to infer more information from the original 
input, improving the web and allowing us to do more complex queries. 


As we previously said, the usage of SW in the AEC involves different areas of the industry; a review of some of 
the most significant utilisations is reported below. 


H. Abanda et al. (2011) proposed a SW based decision support system, which helps the government to speed up 
and automate the bureaucratic processes. This approach shows how SW could be a powerful ally for the PA to 
manage all the applications for a licence. In Karan et al. (2015) the SW is used to overcome heterogeneity problems, 
which came from the traditional methods of heterogeneous data sharing, generating IFC from semantic web query 
results. The IFC structure lends itself to a translation in linked data, which is why Zhao et al. (2020) suggested a 
method for IFC data merging based on miming the IFC structure with nodes and edges. The IFC graphs are merged 
and then restored in the starting files, implementing the information. Merging information through SW can be 
done not only using the same type of input, like different IFC files; in fact, Malinverni et al. (2022) used an 
approach that merges information from GIS and BIM models, producing an enriched model that has many benefits 
for the entire project life cycle. Also, the safety issues are affected using SW, and Zhang et al. (2015) developed a 
prototype of a new approach to organize, store, and re-use construction safety knowledge to produce a framework 
that supports automated job hazard analysis in BIM. Also, the safety issues are affected using SW, and Zhang et 
al. (2015) demonstrated a prototype of a new approach to organize, store, and re-use construction safety knowledge 
to produce a framework that supports automated job hazard analysis in BIM. 


The damage and degradation analysis can also be positively affected by linked data and an interesting example is 
reported by Jung et al. (2021); they discussed an automatic approach to infer the causes of concrete cracks starting 
from information about pattern, location, and penetration status. The possibility to express human thinking and 
make a logical inference by machines thanks to ontologies can remove the issue of complex qualitative analysis 
during the process of identifying the cause of a crack. 


The most meaningful domain analysed for this paper is 5D planning, for which can be found different approaches. 
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F. H. Abanda et al. (2011) developed an ontology-based technology in modelling information about labour costs 
that aims to facilitate decision-making among building developers in Cameroon; Vakaj et al. (2023) proposed a 
new domain ontology called Offsite Housing Ontology to support cost estimation about resources, products, and 
production processes. A further reference for cost estimation is Fiirstenberg et al. (2021); they studied how 
semantic web technology can support BIM-based automated cost estimation and the related challenges, focused 
on Norwegian road projects. 


The need to find ways of translating IFC into ontologies led to different approaches. The first interesting attempts 
can be found in Beetz et al. (2005), where two different approaches convert the EXPRESS schema of an IFC into 
an ontology in OWL notation: one using an intermediate step with an XSLT file between the XML file and the 
OWL, and a second which derives the OWL notation directly from the original EXPRESS schema format of the 
IFC. After that the approach evolved, leading to Beetz et al. (2009) which presented a semi-automatic way of 
lifting EXPRESS schemas into ontologies. Pauwels et al. (2015) analysed the correct ways to translate EXPRESS 
language into OWL. Hoang and Torma (2015) present the IFC2LD converter, a Java application with a Web 
interface, for converting IFC schemas into OWL2 ontologies and IFC data into RDF graphs aligned with the 
ontologies. Pauwels and Deursen (2015) present their online RDF to IFC conversion service, which converts an 
IFC into RDF triples. Continuing chronologically, Ismail et al. (2017) show a workflow for the automatic 
transformation of IFC into an object graph database; this method is based on a dynamic EXPRESS parser and a 
web script console that creates a meta graph inside Neo4j. Bonduel et al. (2018) developed a conversion of IFC to 
RDF using W3C Linked Building Data modular ontologies. The graphs are structured with three types of 
ontologies: BOT (building topology), PRODUCT (classification of building elements), and PROPS (building- 
related properties); the result is a more user-friendly graph than the ifcOWL Abox graphs. 


2.2 Cost definition in IFC domains focus on IfcCostItem 


Nowadays cost items are associated as attributes to geometric entities. However, to correctly return the analysis 
and economic evaluation processes, it is necessary to have cost architectures configured as more complex systems 
than a simple attribute associated with a geometric object. The IFC standard, through the cost class (/fcCostItem), 
offers the possibility of structuring a cost data model. 


IfcCostitem is anon-geometric entity, a subclass of [fcControl, within IFC. [fcCostItem describes a cost or financial 
value with descriptive information that describes its context (BuildingSMART, 2022). It represents the cost of 
assets and services, the execution of works by a process, lifecycle cost, cost estimates, budgets, and more in the 
IFC standard. 


IfcCostItem is characterized by its own attributes (PredefinedType, CostQuantites, and CostValue) and others 
inherited. An [fcCostItem has the possibility to instantiate one or more cost values (/fcCostValue). Other key 
features are that every single /fcCostItem can be nested to create cost assemblies through the /fcRe/Nests report, 
can be assigned to an /fcProduct through the [fcRelAssignsToControl report, may have an associated product 
through the /fcRelAssignsToProduct report or a resource through the /fcRelAssignsToResource report. 


2.3 IffOWL and Linked Building data (LBD) 


In recent years, the area of cross-domain linking has received increasing attention. This area aims to combine data 
from various sources with construction data, management of information based on ontology, and analysis of the 
performance of buildings (Pauwels, Krijnen, et al., 2017). According to W3C, the Web Ontology Language (OWL) 
is a language designed to represent complex knowledge about objects, relationships between objects, and groups 
of objects in a way that can be exploited by computers UfcOWL - BuildingSMART Technical, 2023.). 
BuildingSMART has developed the IffOWL ontology based on these definitions, providing an OWL 
representation of the Industry Foundation Classes (IFC) schema, maintaining the same status as the Express 
schema. OWL concepts (OWL - Semantic Web Standards, 2023) can be used to construct RDF graphs, called 
OWL ontologies (Pauwels & Terkaj, 2016), enabling easy linking between the building data and material data, 
cost data, GIS data, and so forth. However, due to the complex structure of the IFC data model, the ifcOWL 
representation of geometric data is difficult to manage (Pauwels et al., 2017a). 


Achieving interoperability between domains is the main purpose of the Linked Building Data (LBD) Community 
Group in the World Wide Web Consortium (W3C) (The Linked Building Data Community Group, 2021). LBD 
allows for storing construction data sources separately and processing them through digital and computer systems 
(Curry et al., 2013). This results in a set of data that can be utilized and interconnected. Expandability is a key 
aspect of AEC where most projects are fragmented, complex, and diverse. Using the LBD different ontologies can 


815 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


be mapped and enhanced each other, facilitating a more comprehensive and integrated approach to handling 
diverse data sources and formats within the AEC industry. 


3. RESEARCH AIM & METHODOLOGY 


The research aims to verify the effective possibility of associating the new cost domain with the geometric one 
through the SW. This will allow to relate different domains within the same environment to improve data sharing 
and interoperability. In addition, it will be possible to manage cost items no longer as simple attributes attached to 
a geometric object, but as real cost architectures, more complex, ensuring in the future also the ability to verify 
and validate the associated data. 


The first study to examine the possibility of structuring a cost domain using Industry Foundation Classes (IFC) 
has already been addressed in Cassandro et al. (2023). In this work, based on the assumptions and limitations of 
previous research in Cassandro et al. (2023), has been translated the ontology previously developed in the Linked 
Building Data (LBD) format. This will allow us to relate the cost domain to the existing domains (in the specific 
case the geometric domain); in fact, SW technology is well-suited to link knowledge stored in different domains 
(Beetz et al., 2015; Pauwels, Zhang, et al., 2017). 


The methodology adopted is characterized by the following steps presented in Figure 1: 


1. Study of the State of the Art of the current practices and research connected to SW and graph database; 


2. Analysis of IFC entities UfcCostlItem identified in the standard to manage the cost information) and how 
to translate it into LBD; 


3. Translation of cost ontology, developed in Cassandro et al. (2023) from IFC to LBD through a tool 
developed by Bonduel et al. (2018); 


4. Information validation in a graph database such as Neo4j; 


5. Results of experimental research 


A way to represent cost information by using SW was formulated, developed, and validated. This could provide 
the basis for the information exchange resources among information systems for the more user-friendly query of 
data and linking different domains. The next sections describe the mechanisms used to implement and test the 
method. 


State of the Art Working assumption 
c 
START }—>|Semantic Web-Graph| —>} ItcCOWL-IfeCostitem- mna! OSH | EJ san FINISH 
to mna! | EJ scusson 
a Database LBD e 


—— 


Fig. 1: Research Methodology 


4. RESULTS 


In this research, the case study is the same as that analysed by Cassandro et al. (2023). This allows to fully 
understand the differences between the methodologies adopted and to compare the results obtained during the two 
research. The case study is a wall composed of six layers; each layer corresponds to a different cost item within 
the regional price list (Lombardy Region) which must therefore be associated with the related geometric object, 
see Figure 2. These cost items have already been structured within the IFC data model. 
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GEOMETRIC OBJECT PRICE LIST ITEM 


CODE DESCRIPTION 


@ Internal Painting 


: E FIOT Mele jä: ; 

: e: internal Plaster — i IFC file 
IFC aE H Masonry in Drilled Blocks ——» poe | PAE IFC file 
file : =i Thermal Insulation ——= i IFC file 

ie LIS External Palster _ ; 

: : IFC file 

ie + +t : External Painting i 


IFC file 


Fig. 2: Example layers of masonry and relative price items 


Then the individual IFC files were converted into RDF format using the tool "IFCtoLBD!" developed by Jyrki 
Oraskari and Mathias Bonduel. This tool allows to convert an IFC file to a Turtle file described using BOT and 
optionally PRODUCT, PROPS, and GeoSPARQL (Bonduel et al., 2018). 


The correct export specifications have been set thanks to Bonduel et al. (2018). For the correct output, it is 
necessary to activate the PROPS module of the ontology structure, which includes three levels of complexity. For 
the case study, Level 3 was selected (i.e., the most complete level). Thereby, the Blank node option was activated, 
which decreased the file size, exported RDF, and improved the readability. The output is a Terse RDF Triple 
Language (Turtle); a format made to express RDF data. This format uses triples made by subject, predicate, and 
object to represent information. 


At this point the new file in Turtle format is loaded inside a graph database (in this specific case Neo4j is used); in 
this way, you can view the information and links between these (Figure 3). As visible from Figure 3 every entity 
is associated with a series of intrinsic attributes of the same one. IFC information is intrinsically interconnected 
and can naturally be represented by graphs. Figure 3 shows the IFC file for a painting cost item at the top and the 
corresponding data representation in a graph system at the bottom (nodes and edges). As it is visible, the graph 
representation is more intuitive in revealing the relationships between instances than the text based IFC. 


Translating the IFC data model into a graphical system can lead to a simplified representation of the construction 
information and its relationships, as well as improving data query. 


The developed methodology relies on an IFC, which contains both the geometrical information and cost 
information, inserted into the file using the appropriate classes previously studied in Cassandro et al. (2023). The 
creation and compilation of ZfcCostItem classes has been implemented in Python using IfcOpenShell, as shown in 
Cassandro et al. (2023). 


The conversion from IFC to an LBD was carried out using a tool developed by Bonduel et al. (2018). The correct 
exporter setting has been set after several attempts to get the type of LBD needed. Figure 4 shows all the cost item 
files, in RDF format, imported into the graph database and related to the geometric object (wall system). After that, 
the latter was also imported and displayed, visible in Figure 5. 


The Neo4j graph database has been used to visualize the data. The Neosemantics plugin (n10s) was used to load 
RDF data and its associated vocabularies, including OWL, RDFS, SKOS, and others, into Neo4j. This plugin 
extends Neo4j's capabilities to work with semantic data in RDF format, allowing users to import, store, query, and 
analyze RDF data within the Neo4j graph database. It facilitates the integration of RDF-based knowledge graphs 
and linked data into Neo4j, enabling more comprehensive and semantic data modeling and analysis. 


' https://github.com/jyrkioraskari/IFCtoLBD 
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Fig. 4: Representation in the graph system of the six different cost items starting from IFC files 
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Geometric Element — Wall System 


Fig. 5: Representation in the graph system of the geometric object (wall system) starting from IFC files 


Through the Cypher script language, it was possible to work with the different Turtle format files available. Cypher 
is a declarative query language created specifically for working with graphs and interacting with the Neo4j 
database. Cypher queries are very expressive and readable and allow operations such as creating, editing, and 


querying data within the Neo4j database. 


The first step was to load the data of the different files (Figure 6 - Script 1, Script 2). Subsequently, the data files 
were queried, and two node-entities were identified: [fcCostltem (Figure 6 — Script 3) which gathers all the 
architectural data related to individual cost items (within the cost domain) and /fcE/ement (Figure 7 - Script 7) 
which represents the geometric domain to which the cost item must be associated. 


CALL n10s.rdf.import.tetch{ «file:///path/to/file.tti», "Turtle") 


terminationStatus 


triplesioaded tripe sParsed 


MATCH (n) 
RETURN n; 


MATCH (n:ns0_ 
WHERE ID(n) = 5172 
RETURN n; 


 @ 


MATCH (n:ns0 Ulem){rens0_ relatedObjects_IfcRelAssigns]- @ æ 
(m:ns0__HcRetAssignsToProduct)- 

[r2:ns0__relatingProduct_lfcRelAssigns ToProduct]-(m2:ns0__ifcCovering) 

RETURN n, r, m, r2, m2; 


CREATE (newNode:ns0__ tick ignsToControl {name: 
"Rel IfcCostitem-lfcElement_IntPainting’}); (-) 


RETURN newNode 


MATCH (costitem:nsO__ em) 

WHERE ID(costitem) = 5172 

MATCH (newNode:ns0 R an ontrol) 
WHERE ID(newNode) = 5240 


CREATE (new Node) 
[:nsO__relatingControl_IfcRelAssigns}->(cov! ) 
RETURN costitem, newNode; 


Fig. 6: First script sequence for [fcCostItem-IfcCovering connection (cost-layer painting) 
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Finally, the relationship between the two nodes-entities contained in the two domains has been created (Figure 7 
— Script 9). As we can see in Figure 7, the two entities belonging to the two different domains (costs and geometry) 
have been connected using the logic intrinsic to the IFC data model. A node ("5240 - [fcRelAssignsToControl") 
has been created that corresponds to the exact IFC entity that ensures the connection between geometric entities 
and cost entities. This has led to the connection of the two different domains (cost and geometry). 


Script 7 
MATCH (cov:ns0__IfcCovering)-[r:ns0__name_IfcRoot)-{m:ns0__iicLabel) @ 


MATCH (cov:ns0__IfcCovering)-[b:ns0__globaltd_IfcRoot]-{c:ns0__IfcGloballyUnigqueid) 

WHERE ID{cov) = 4723 & 

MATCH (newNode:ns0_ IicRelAssignsToCe Ih:ns0__relatingControl_ifeRelAssigns]- 

(costitem:ns0__ {item )}fd:nsO__relatedObjects_ifcRelAssigns}-(e:nsO__ } j 
{r2:nsO__relatingProduct_IfcRelAssignsToProduct}{samplewall:ns0__ifcCovering) 

RETURN wall, r, m, b, c, costitem, h, newNode, d, e, 12, samplewall @ © 


MATCH (newNode:ns0__'icRelAssignasToControl) @ 
WHERE ID(newNode) = 5240 


MATCH (covel:ns0__ifcCovering) j 
WHERE ID{covel) = 4723 i 
CREATE (newNode)-[:ns0__relatedObjects_IfcRelAssigns]->{covel) 


RETURN newNode, covel (=) 


MATCH (cov:ns0__!feCovering)-[r:nsO__relatedObjects_lfcRelAssigns] ==] 
(newNode:ns0__lichelAssignsToControl)-{h:ns0__relatingControl_ifcRelAssigns]- i 
(costitem:ns0__ifcCost tem){d:ns0__relatedObjects_IfcRelAssigns]-(e:ns0__ juct}- ~ / 
[r2:ns0__relatingProduct_IfcRelAssignsToProduct]-(samplecov:ns0__|fcCovering) ‘ “i } 


RETURN cov, r, costitem, h, newNode, d, e, r2, samplecov 


Fig. 7: Second script sequence for [fcCostItem-IfcCovering connection (cost-layer painting) 


This procedure has been replicated for the remaining cost items to associate with the respective geometric objects 
to obtain a new system to graph containing the data of the geometric domain and the cost domain. Figure 8 shows 
the final output and a zoom on the association of the cost of the layer of internal painting 
(IfcCostItem_InternalPainting) to its geometric object (JfcCovering). 


MATCH (n) 
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Fig. 8: Final output and zoom on the association of the cost to its geometric object (fcCovering) 
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5. DISCUSSION 


The study seeks to optimize and make the data of the cost items more user-friendly. Currently, these are displayed 
only as informational inputs, lines of code, or even simple attributes. The aim of the research is to focus on a 
subject of great interest and still cause numerous legal disputes. In the research, the possibilities and the limitations 
of the proposed method are highlighted based on the association of different domains within a graphic system. 
This would allow you to link different data from files even in different formats. This study presented how price 
elements can be instantiated as a graph and how their visualization allows us to better understand the logic of 
connection and relationship between entities. It has been studied the possibility of visualizing a new domain of 
cost for the price list of the Lombardy Region based on a graphical system to standardize and regulate the prices 
and the relative information. 


Converting the IFC data model into a graph system can bring many advantages (Pauwels, Zhang, et al., 2017), 
(Zhu et al., 2022). The main reasons why converting IFC to a graph is an objective to be pursued are: 


- a simplified representation of construction information (Silvescu & Caragea, 2019); 
- clearer object relationships (Figure9); 
- improved data integration and interoperability (Rodriguez & Neubauer, 2010), (Mazairac & Beetz, 2013); 


- improved query and processing information (Pérez et al., 2010). 


However, even if this methodology makes the data more visually intuitive, due to the limitations of the current 
tools available, it is not yet possible to convert the information in the IFC data model into RDF in a simplified way. 
A further limitation is due to the use of a programming language to query, create, and associate data belonging to 
different domains. This causes problems in a sector, the AEC, which is only beginning in recent years to interface 
with these new technologies. 


This research has led to technological attempts made through the writing first in IfcOpenShell of individual cost 
items and then in RDF format for their association to the geometric domain and their simplified visualization; in 
this way, it is not necessary to understand the logic behind the IFC data model. The achievement of results that are 
real, effective, and scalable confirms the scalability of the method as it can also be implemented for other list items. 


6. CONCLUSION 


The results show how SW technology can be used to show and relate the cost domain for construction projects to 
different domains, such as the geometric domain. Starting from the IFC data model, the cost domain has been 
translated into LBD ontology thanks to the IFCtoLBD converter developed by Bonduel et al. (2018). In fact, due 
to the complex structure of the IFC data model, the LBD representation makes it easier for stakeholders to visualize 
and manage the data contained in the models. The IFCtoLBD converter developed by JURI uses the smallest BOT, 
PRODUCT, and PROPS ontologies to better separate and represent data (Bonduel et al., 2018). 


In this study, the proposed architecture for the new cost domain, developed and validated by Cassandro et al. 
(2023), can be easily visualized within the graphical database. As a result, individual cost items and related 
information become readily accessible in the graphical database. Moreover, these elements can be queried 
individually, associated with their corresponding geometric objects, or even extended by creating new nodes and 
relationships. 


During the study, several limitations were identified. Firstly, the complexity of data in the AEC industry can be 
more difficult due to the variety of information involved, demanding a deep understanding to integrate data into 
LBD format. Secondly, the AEC domain includes numerous standards and ontologies, which makes it difficult to 
ensure compliance with all relevant standards (e.g., IFC) when integrating data into LBD. Finally, interoperability 
remains a concern as not all software platforms are equipped to handle LBD, posing difficulties in achieving 
seamless data integration and interoperability across different platforms. 


For future work, it is essential to prototype and extend the concepts explored in this study. In addition, the 
development of an extension for the converter to improve data translation should be considered. 
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ABSTRACT: Poor construction quality is one of the most significant challenges for the construction industry. 
However, failures can be avoided or minimized by inspections based on detailed quality inspection plans as a part 
of quality assurance. Therefore, structured and project-specific planning of inspection plans is required to provide 
inspectors with the right information. 


Nevertheless, inspection planning is mainly manual, dependent on the individual's experience and high level of 
effort. As a result, inspection planning is often neglected and limited to providing general checklists that often lack 
semantically rich descriptions and are unspecific concerning individual project requirements. Furthermore, proper 
planning of inspections requires multiple information sources, such as building design, schedules, contractual and 
supplier guidelines, and standards, all of which must be provided or linked via an information model. Current 
research lacks an adequate formalized knowledge model to provide the knowledge-driven inspection planning 
process with the necessary domain knowledge to support inspection planning with heterogeneous information 
defined in isolated systems. 


Therefore, this paper extends the Ontology for Construction Quality Assurance (OCQA) with the OCQA-Thermal 
Insulation (OCQA-TI) to formalize thermal insulation inspection planning knowledge. The OCQA offers a new 
linked data model that provides explicit knowledge of quality inspection planning. The development of the OCQA- 
TI follows the Linked Open Terms (LOT) methodology and is implemented using the Web Ontology Language 
(OWL). 


The proposed ontology is evaluated using various approaches, including automatic consistency checking, 
answering competency questions, and criteria-based evaluation. The results indicate that the OCQA-TI can 
provide inspectors with relevant inspection planning knowledge and integrate various related information streams, 
thus providing a more comprehensive and efficient approach to insulation inspection planning. The functionality 
of OCQA-TI enables the fulfillment of increased sustainability and energy efficiency requirements by providing 
insulation inspection knowledge. 


KEYWORDS: semantic web, building insulation, ontology, quality assurance in construction, inspection planning 


1. INTRODUCTION 


Construction faces a significant quality problem, evidenced by an analysis by the German insurance company 
VHV showing that the total cost of reported damage claims has risen for years (Böhmer et al., 2022). A study by 
BauInfoConsult (2022) concurred, assuming a defective cost share of 12.8% of the industry turnover. With a total 
construction industry turnover of 143 billion euros in 2020 issued by the German Construction Industry Federation, 
the study extrapolates error costs of approximately 18.3 billion euros (BauInfoConsult, 2022). 


Research and practice approaches have primarily focused on defect management yet have often neglected 
proactive defect prevention through inspection planning (Seif, 2022). However, proactive inspection planning 
could avoid many errors and save costs. The VHV Report shows that approximately 58% of the causes of damage 
can be traced to avoidable execution and assembly errors as well as interface and communication problems. In 
addition, inadequate construction supervision accounts for approximately 7% of the causes of damage (Böhmer et 
al., 2022). Thus, a proactive, construction-accompanying quality assurance approach can address approximately 
65% of all causes of damage. 


Currently, energy efficiency and building sustainability requirements are increasing significantly worldwide. The 
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European Union has set targets for the construction and real estate industry’s energy future. Climate neutrality in 
the building sector is to be achieved by 2050. Hence, new buildings must be constructed to the zero-emissions 
standard as early as 2028 (European Parliament, 2023). In the specific domain of insulation, quality assurance is 
particularly important. Insulation is crucial to buildings’ energy efficiency, thermal resistance, and interior comfort 
since poor insulation installation and use can lead to heat loss, thermal bridges, moisture problems, and an overall 
decline in a building’s energy performance (Sassine, 2013). 


National policies were implemented after the oil shocks of the 1970s and the subsequent energy crisis to meet the 
requirements of stricter insulation and energy efficiency regulations. These policies led to adopting standards in 
France, such as the “Documents Techniques Unifiés” and the RT 2012 and RE2020 rules, regulating the materials 
used and their installation methods while quantifying buildings’ losses, gains, and energy needs. However, the 
practical application of these rules has sometimes been questionable due to negligence or ignorance. Hence, it is 
difficult for a conventional inspection to identify potential defects since the insulation materials are always covered 
with various finishing elements (Antoine Szeflinski, 2007). 


Non-destructive testing plays a crucial role in addressing these issues, allowing for precise recognition of 
construction phases, authentic elements, original and restoration materials, construction techniques, deterioration, 
durability, and structural strength. These are essential for assessing the energy and environmental performance of 
the building envelope. Visual tests, thermographic tests, sonic and ultrasonic tests, thermal flux measurements, and 
microclimate analyses are used in energy audits to obtain accurate results (LUCCHI, 2011). Notably, thermal 
insulation implementation inspections often focus on performance after completing the construction work. 
Nonetheless, this approach has a major drawback: a delay in detecting implementation defects. While waiting for 
performance inspections, quality problems can develop and become more difficult and costly to correct. 


Therefore, considering corresponding inspection plans to guarantee the quality of finishing trades such as thermal 
insulation and meet the expanding quality requirements is necessary. However, quality assurance in the building 
industry concentrates on shell construction trades (Berner et al., 2015), while finishing trades often cause increased 
costs due to defects. In analyzing the BSB’s quality inspections during construction, damage to thermal insulation 
accounts for a comparatively high 10.7% of total damage (Böhmer et al., 2020). France’s heat and climate 
protection ordinance has led to a steady increase in the proportion of thermal insulation measures (Böhmer et al., 
2020). Sales of External Thermal Insulation Composite Systems (ETICSs) have also steadily risen in Germany 
since 2018 (Statista, 2023). In parallel, the number of defects caused by ETICS has also increased. For example, 
building projects with ETICS are often affected by defects such as cracking, moisture penetration, and 
delamination (Böhmer et al., 2020). 


Hence, introducing inspections of thermal insulation implementation throughout the construction process would 
be beneficial in addressing this problem (Zhong, 2012). This approach would allow for early defect detection and 
correction, implementation quality improvement, and optimal energy performance assurance in buildings from 
completion. Thus, exploring automation solutions to evaluate inspections would be imperative to bridge the gap 
between regulatory standards and on-site inspections. However, to our knowledge, there is still a lack of a 
structured and computable representation for formalizing the thermal installation regulatory standards. Meanwhile, 
the interoperability between standards and the on-site inspection information also remains. Ontology is known as 
a formal specification of the domain knowledge, which has been widely explored in the construction domain to 
facilitate knowledge management and information integration issues (Pauwels et al. 2017). But an ontology that 
is adequate to represent specific domain knowledge of the thermal installation quality inspection is still missing. 
Therefore, in this research, we aim to extend the Ontology for Construction Quality Assurance (OCQA) for 
involving the specific domain knowledge for the thermal installation inspection process. The proposed ontology 
aims to formalize thermal insulation inspection planning knowledge and link the on-site inspection information to 
achieve an automated evaluation of the thermal insulation evaluation. Inspections related to thermal insulation 
require the integration of data from various and heterogeneous sources, and the choice of technologies associated 
with ontologies was made to provide a framework for aligning and connecting disparate knowledge. 


This paper is structured as follows. Sections 1 and 2 provide a general review of quality assurance and thermal 
insulation. Section 3 reviews the work on ontologies related to general inspection planning, while Section 4 
describes the methodology used to develop the proposed ontology. Next, Section 5 discusses the specifications of 
OCQA-Thermal Insulation (OCQA-TI) ontology, while Section 6 provides a methodological conceptualization 
and implementation, which is evaluated using an example in Section 7. Finally, Section 8 discusses the limitations 
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and contributions of the research and provides conclusions. 


2. BACKGROUND 
2.1 Quality Assurance and Inspection in Construction 


Presently, there is a greater focus on comprehensive quality inspections and defect management to satisfy 
contractors in their construction projects until the client’s final acceptance. However, despite these efforts, many 
gaps and discrepancies are still identified near the project’s end. The issues are related to a lack of on-site personnel 
for managing quality and defects, an overwhelming workload to meet deadlines, non-unified checklists with much 
manual paperwork, poor communication among different project stakeholders, and a complex and labor-intensive 
interior finishing process. These problems require additional work to rectify the defects, often leading to project 
delays (Young S. Kim, 2008). When defects are detected after acceptance, the contractor has claims, so for 
executing companies, the expense of eliminating defects is worth avoiding defective work at all costs. Moreover, 
the services of the subcontractors must be checked for defects. Indeed, when a subcontractor’s defect is 
unrecognized before the execution and leads to a defective overall performance, it results in a renewed execution 
(Langen & Schiffers, 2005). 


Currently, construction projects have become increasingly complex. Hence, the more complex the construction 
projects and processes, the greater the quality risks and the relevance of undistributed information and 
communication processes in the planning and construction process (Böhmer et al., 2022), often leading to 
avoidable errors. In addition, decisions regarding quality inspections, including construction activities not to 
overlook and tasks to inspect, the quality data to be collected and verified, and the acceptance criteria for quality, 
are evaluated on-site by inspectors based on their previous experiences, which can vary from one inspector to 
another (Tan, 2010). This variation may arise for several reasons, such as a lack or absence of knowledge 
concerning regulations or feeling overwhelmed by the amount of regulatory text referenced for applicable 
provisions during quality inspections. As a result, manual verification of construction quality compliance has been 
a time-consuming and error-prone task (C. Eastman, 2009). 


Therefore, construction inspectors require assistance with planning inspections to effectively define inspection 
objectives, explore various options, and efficiently allocate inspection resources. Thus, the inspection planning 
process can be divided into four steps (Lin, 2018) used to specify inspections by answering key inspection planning 
questions (DIN9001, VDI2619). First, the necessity of inspecting a characteristic (what?) is determined, and the 
associated inspection objects are defined. Second, the inspection time (when?), frequency (how many?), and scope 
(how much?) are determined. The third step defines the inspection procedure (how should the inspection be 
conducted?) and equipment (what materials are required for the inspection?), which must be simultaneously 
conducted since they are interdependent. This step also includes the inspection location (where is the inspection?) 
and the inspector (who is inspecting?). Finally, the fourth step determines the recording, management, and 
evaluation of the collected inspection data (Lin, 2018). 


Furthermore, the agreed quality must be clarified as a building requirement to assess the objective quality and 
secure the success of construction work. In Germany, this clarification results from the general contractual 
conditions, the performance specifications, and the technical contractual conditions, according to §1 Para. 2 
VOB/B. The construction target and, thus, the quality requirements of each construction project vary according to 
the project-specific performance specifications and contract components. A minimum set of requirements is 
ensured in German construction law in §1 para. 1 VOB/B by the obligatory agreement on the provisions of VOB/C, 
the General Technical Terms of Contract for Construction Work (VOB, 2016). 


Moreover, verification in the form of testing activities is necessary per DIN EN ISO 9000 to ensure the quality of 
construction projects. Therefore, meeting the specified requirements should be verified using objective proof (ISO 
9000:2015). In the construction industry, these inspection activities are conducted with structural acceptances 
divided into the following types: internal acceptance (according to the quality management plan) and acceptance 
under public law. Hence, a defect exists if the executed construction work deviates negatively from the contractual 
construction target (Berner et al., 2015). 


2.2 Semantic Web and Ontologies 


Data in the construction industry is generated in isolated systems and exchanged using various file formats, often 
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with an insufficiently described relationship. The semantic web and associated linked data concept offers technical 
standards for a comprehensive, machine-readable exchange of heterogeneous information (Beetz et al., 2021). In 
a semantic web context, ontologies describe a shared conceptualization of a specified knowledge domain in a 
community of users. Therefore, using semantic web technologies, an ontology represents a part of the real world 
in a semantic model (Synak et al. 2009). 


The foundational semantic web language is the Resource Description Framework (RDF), and its data structure is 
organized in triplicate, consisting of a subject, predicate, and object. The subjects and objects are depicted as nodes, 
while predicates are represented as edges connecting these nodes. As such, these RDF structures are often referred 
to as triplestores or RDF stores (Hitzler et al., 2008). These RDF structures are characterized by directed and 
labeled graphs (Herman 2004). 


The capabilities of the RDF are further enhanced by additional languages and schemas, such as the Resource 
Description Framework Schema (RDF-S) and the Ontology Web Language (OWL). For instance, RDF-S broadens 
the scope of the RDF schema by introducing classes and defining properties. In comparison, the OWL utilizes 
RDF-F to represent ontologies and extends the model by establishing constraints among classes, properties, and 
entities (Allemang et al., 2020). 


Typically, ontology knowledge provided by semantic web technologies is divided into two main components: an 
assertion box (A-box) and a terminology box (T-box). In some semantic web applications leveraging a specific 
tule language, an additional component, a rule box (R-box), is included. The T-box holds intentional knowledge 
or the terminology outlining a specific domain using a database schema-like vocabulary that incorporates classes, 
properties, and relations. The T-box’s concepts remain constant and do not vary over time. In contrast, the A-box 
manages the facts tied to the terminology terms introduced by the T-box. In this way, the A-box instantiates the 
predefined classes and conceptual model with real-world individuals. It encompasses extensional knowledge about 
specific situations likely to change over time (Baader et al. 2007; Baader et al. 2017; Pauwels et al. 2017). 


3. ONTOLOGY WORKS RELATED TO INSPECTION PLANNING 


As we step into a new era of digital transformation in various sectors, there has been a surge of research and 
development efforts in the field of ontologies. Some have been particularly related to construction and inspection 
planning. Thus, this section aims to provide a comprehensive literature review starting with general ontology work 
in the construction domain and ontologies used in inspection planning. 


Recently, ontologies have become increasingly relevant in the AEC sector, addressing challenges like data 
integration and knowledge management (Pauwels et al. 2017). Various ontologies have been created in the 
construction field, including the generic e-COGNOS (El-Diraby et al. 2011) and IC-PRO-Onto (El-Gohary et al. 
2010). These ontologies, while comprehensive, lack a detailed representation of construction inspection, 
necessitating further expansion for accurate inspection representation. Therefore, further development resulted in 
Digital Construction Ontologies (DiCon) by Zheng et al. (2021), defining broader construction workflow-related 
entities and successfully integrating data from diverse systems. Beyond construction, ontologies such as IFCOWL 
(Pauwels 2016) and the Building Topology Ontology (BOT) by Rasmussen et al. (2020) have offered valuable 
semantic structures for building information modeling and topological concepts, respectively. 


In the realm of construction execution, ontologies have made significant strides in addressing issues such as 
construction data integration and knowledge management. Therefore, this section focuses on ontologies 
specifically tailored to construction quality inspection. Zhong et al. (2012a, 2012b) introduced the CQIEOntology 
for quality management. This ontology primarily focuses on quality compliance checking while supporting 
inspection planning in an ancillary manner. The ontology expresses complex constraints usually found in quality 
regulations using the Semantic Web Rule Language (SWRL). This ontology-based approach enhances the 
construction quality inspection process by integrating these regulations within the construction process. 


Martinez (2019) proposed an alternative method involving SPARQL queries to retrieve quality regulations based 
on the construction object and materials used, as specified in the ontology. Although the study primarily focused 
on predefined checklists, it overlooked the planning and preparation of quality inspections. The research aimed to 
primarily explore offsite manufacturing processes related to drywall steel frames. 


Lastly, Xu (2019, 2021) developed the Highway Construction Inspection Ontology (HCIOntology), formulating 
inspections based on pay items and specifications defined by the Indiana Department of Transportation. The user 
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interface allows users to select predefined check items based on to-be-inspected pay items, supported by a risk 
matrix. These check items contain various fixed characteristics of inspections, such as frequencies, objectives, 
checking conditions, and training documents. 


The review demonstrated that ontologies are viable for formalizing construction domain knowledge. However, it 
also indicated insufficient effort is spent integrating diverse information sources and making inspection planning 
knowledge of multiple domains (e.g., trades) available. Additionally, the inspections are inadequately described 
and undefined in light of the shared terminologies provided by ISO 9000, ISO 9001, and DIN 55350. Research 
has also neglected inspection planning for finishing trades, specifically the thermal insulation trade. Therefore, this 
research paper aims to 1) prove the reliability of OCQA for multiple trades, 2) provide knowledge for inspection 
planning of the thermal insulation trade, and 3) integrate heterogeneous information defined in isolated systems 
related to thermal insulation inspection planning. 


4. METHODOLOGY 


Concerning OCQA, a hybrid approach was adopted based on the Linked Open Terms (LOT) methodology 
developed by Poveda et al. (2022). LOT was designed as an industry-oriented ontology development methodology 
more related to practical engineering cases than others and has been well evaluated through practical use. 
Furthermore, the activities in applying LOT are described in detail and documented on GitHub (Poveda-Villalon 
et al., 2022). The detailed process of the ontology’s development methodology is illustrated in Figure 1. The 
process is divided into four phases: 1) specification, 2) implementation, 3) publication, and 4) maintenance. 


The specification phase defines the ontology’s scope, purpose, use cases, users, and requirements (Fernandez- 
Lopez, M. et al., 1997). Throughout the entire ontology engineering process, knowledge acquisition remains an 
ongoing and iterative process. Thus, a conceptualization is created, incorporating existing ontologies and encoding 
by building upon the specification. Subsequently, the developed ontology is evaluated to ensure its effectiveness. 


Various evaluation methods are employed to assess the developed ontology. Following the evaluation phase, the 
ontology is documented using a Hypertext Markup Language (HTML) sheet. For wider accessibility, the ontology 
is published on GitHub, enabling a broad range of users and developers to access and reuse it. 


GitHub is a valuable platform for ontology maintenance, facilitating collaboration and version control based on a 
repository (GitHub, 2020). It ensures the ontology remains up-to-date and well-maintained for its users and 
stakeholders. Thus, the following sections present the process steps of specification, implementation, and 
publication. 
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identification 


Fig. 1: Ontology development methodology according to Poveda-Villalon et al. (2022) 


5. SPECIFICATION 


This specification aims to create a formal document in natural language, utilizing competency questions (CQs) as 
the fundamental requirements (Fernandez-Lopez, M. et al., 1997). These CQs are derived from predefined use 
cases, aligning with the purpose and scope of the respective ontology. The resulting specification document is 
available on GitHub (Poveda- Villalon et al., 2022). The subsequent discussion formalizes the specification process 
steps. 
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Purpose: The ontology aims to formalize inspection plans for the domain of thermal insulation and related 
information of required domains, providing a shared representation of insulation inspection planning knowledge 
that specifies the terminology, semantics, and relations of inspection planning in the domain of thermal insulation. 
The terminology is defined according to DIN 55350:2021 and ISO 9000. Therefore, the purposes of the ontology 
can be summarized as follows: 


1) providing a vocabulary for describing inspection plans and inspections for thermal insulation, 
2) supporting manual inspection planning by providing detailed inspection planning knowledge, 
3) supporting quality assurance decisions, and 

4) strengthening the quality awareness of the staff. 


Scope: OCQA-TI mainly focuses on representing the implementation of thermal insulation to ensure its proper 
installation and performance in new buildings and renovation. It provides structured knowledge on the optimal 
ways to install thermal insulation depending on the product and technique used while describing the necessary 
inspections, tests, equipment, and norms required to ensure its quality. Critically, our research focuses exclusively 
on process planning and the quality assurance of thermal insulation installation. 


Use cases: The intended use cases of the ontology can be summarized as follows: 


1) information retrieval by using queries, 

2) heterogeneous data integration from isolated software solutions, 

3) inspection planning support by querying inspection planning knowledge, and 
4) inspection plan validation by reasoning. 


End users: End users are project managers and management teams responsible for construction supervision, 
including inspectors, object planners, and building owners, depending on the project organization. In addition, the 
contractors can use the test plans provided for internal quality assurance. Notably, the users described do not use 
the ontology directly but interact with it via a software application. The ontology represents a substructure 
(backend) of software and is only used directly by developers (backend/substructure). Developers program 
software applications based on the ontology and store the data from heterogeneous sources in the ontology. End 
users use the ontology via an interface (superstructure or frontend) that ensures user-friendly operation 
(frontend/superstructure). For this purpose, users are provided with predefined queries for information or 
inspection planning knowledge retrieval. 


Non-functional requirements (NFRs): NFRs pertain to how a system should perform, behave, and operate rather 
than focusing on what it should accomplish. In the field of construction ontologies, various studies have outlined 
NFRs, which can be summarized as follows: 1) coverage/sufficiency, 2) consistency, 3) usability, 4) 
extendibility/reusability, and 5) clarity and conciseness (Costin, A. and Eastman, C., 2017; Zhou et al., 2016; El- 
Gohary and El-Diraby, 2010; Zheng et al., 2021). These defined NFRs serve as evaluation criteria addressed 
through various evaluation methods. 


Functional requirements (FRs): FRs are CQs aligned with the predefined use case of information retrieval. Table 
1 provides a comprehensive list of the CQs that the proposed ontology aims to address based on questions for 
inspection planning from mechanical engineering that have been extended (Lin}, 2018; Marxer et al., 2021). 


Table 1: CQs related to the use case information retrieval. 
Information retrieval 
1. What are the characteristics being inspected? 
2. What are the related entities of an inspection? 
a. Where is the inspection location? 
b. Who is responsible for the inspection? 
c. What is the norm related to the inspection? 
d. What equipment is required for the inspection? 
3. What are the precedent and subsequent inspections? 
4. What procedures are required for the inspection? 
5. When is the start and end date of the inspection? 
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6. ONTOLOGY CONCEPTUALIZATION AND IMPLEMENTATION 
6.1 Introducing the Ontology for Construction Quality Assurance (OCQA) 


The basis for this work is the OCQA ontology (SeiB, 2022), 
designed to provide information about quality inspections in 
construction to support inspection planning. The main entities of 
the OCQA are the inspection, inspection plan, and regulations. 
The OCQA has a modular structure and can be supplemented as 
desired with inspection knowledge from different trades. 
Therefore, the OCQA-TI is designed as a trade-specific extension 
of the OCQA ontology. In addition, the Digital Construction 
Ontology (DiCon), with its previously described concepts for “J 
construction workflow, including agents, processes, and  ocga 

equipment, is integrated into the OCQA, extending the DiCon Esiin 


Insulation 


with detailed knowledge about inspection plans (Seif, 2022). Fig.2: OCQA-TI as an extension of the OCQA 


The integration among the various ontologies is depicted in Figure 2 and is achieved through hard reuse via 
owl:imports. This approach allows the complete and unaltered reuse of the imported ontology (Poveda-Villalon et 
al., 2022). The OCQA ontology also includes trade-specific modules catering to task-specific inspection planning. 
For instance, the OCQA-screed extension encompasses all inspections related to the screed trade. The OCQA-TI 
will be developed as a trade-specific extension of the OCQA. 


6.2 OCQA- Thermal Insulation (OCQA-TI) 


This section overviews the OCQA-TI and the associated ontologies in detail. Therefore, the classes, relations, and 
properties needed to describe inspections related to the domain of thermal insulation appear in Figure 3. The top 
of Figure 3 illustrates the DiCon ontology and the main classes used as a basis to define the OCQA. The OCQA 
provides the general terminology to describe inspections, which the OCQA-TI will use to specify inspections for 
the trade of thermal insulation. 


Representing all aspects of the thermal insulation inspection was essential to conduct it, including the actual 
inspections, the corresponding inspection plan, inspection equipment, personnel involved, inspection procedures, 
and inspection regulations through specific classes within the DiCon and OCQA ontologies, defined as subclasses 
of the DiCon ontology. The ocqga- Inspection class is a specialized subclass of dicp: Activity class, highlighting its 
unique role in the context of activities defined by DiCon. 


DiCon dicechasLocati — dicp:occupiesTime , ~ = - ai l 
dicechasSubLocation i i Se eae wee / 
apat dice:Location Sas z ` Interval = — A) A 
bag ` e al Dae = = - z 
= x diep:hasEquipment 


~ TD aa 
sT 
eas oper ae oF 
dicp-hasObject dice:isResponsiblefor? dica:Agent 


OCQA a ocqa:contains 
ocqa:InspectionPlan aol pe 


~ 


ocqachasInspection X 


aiman ocqa-reg: Regulation 


~ Scqa-regrequiredby 


— rdfs:subClassOf 


> owl:ObjectProperty 
råfs:Class 


Fig. 3: Overview of classes, relations, and data properties of OCQA-TI and aligned ontologies. 


The OCQA-TI is an ontology extension that enhances and refines the OCQA ontology, building on the DiCon 
ontology’s foundational concepts. Within the OCQA-TI, the :ThermalInsulationInspection class represents various 
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types of thermal insulation inspections like :SpacingInspection, — -AirTightnessInspection, 
and : VapourBarrierInspection and is connected to the ocga: Inspection class in the OCQA, signifying their specific 
role within the scope of quality inspections. Moreover, the /nspection class links to the ocqa:InspectionPlan class 
through the object property ocqga:contains, establishing a relationship between inspections and their inclusion in 
inspection plans. Additionally, the ocqa:hasInspectionProcedure object property connects the ocqga:Inspection 
class to the ocqa:InspectionProcedure class, specifying the procedures associated with each inspection. 
The -ThermalInsulationNorm class in the OCQA-TI, encompassing norms relevant to each thermal insulation 
inspection like RE2020, cahierDeCSTB3728, and cahierDeCSTB3194 V2 is introduced by the object property 
ocqa-reg:requiredby, which link this class to the ocga-regulation: Regulation class. 


Furthermore, the :ThermallnspectionEquipement class is introduced to describe the equipment required for 
thermal insulation inspections like :ThermalCamera, Infiltrometer, and :BlowerDoor and is directly linked to the 
ocqa:InspectionEquipement class in OCQA, which, in turn, is connected to the dice: Equipment class in the DiCon 
ontology, establishing a hierarchy of equipment concepts. Through these interconnected classes and object 
properties, the ontologies provide a comprehensive framework for managing and understanding thermal insulation 
inspections in the construction industry, utilizing construction quality assurance and workflows. 


7. ONTOLOGY EVALUATION 


The ontology evaluation is essential to check whether the developed ontology meets all the predefined 
requirements in the specification process (Fernandez-Lopez, M. et al., 1997). Different evaluation methods can be 
used to check compliance with different criteria (Zheng et al., 2021; El-Gohary and El-Diraby, 2010). This research 
evaluates the proposed OCQA-TI by automated consistency checking, CQs, and a task-based evaluation. 


7.1 Automated Consistency Checking 


Automated consistency checking aims to assess the developed ontology’s consistency, guaranteeing no logical 
conflicts or inconsistencies. This process is achieved with description logic (DL) reasoners to assess the logical 
coherence of the ontology and maintain its integrity by ensuring there are no contradictions. Regarding the OCQA- 
TI, the automated consistency checking was conducted within the Protégé environment using the built-in Pellet 
reasoner. The automated consistency checking results showed that the OCQA-TI is consistent and coherent. 


7.2 Answering the CQs and Task-Based Evaluation 


We obtained practical inspection data from a residential building project about air tightness inspections, which are 
essential to ensure the condition of the air permeability before proceeding with the installation of thermal insulation. 
The assessment involved of implementing blower door tests while utilizing an infiltrometer as the specialized 
equipment to measure the value of the air permeability. The blower door tests were aligned with the stipulated 
guidelines of the RE2020 regulations, in which an air permeability constraint requires the measured value should 
be less than 0.6m3/(h.m3) for individual building types and 1m3/(h.m3) for residential building types. 


The above example data was mapped and instantiated to OCQA-TI for two evaluation tasks. First, we used the 
instance data for answering the specified CQs. For the OCQA-TI, a SPARQL query was conducted to retrieve the 
target information of the thermal insulation inspection task to answer the example-specified CQs adapted from 
ontology CQs defined previously to check the coverage of the ontology. The CQs and results appear in Table 2. 
The results show that the OCQA-TI can retrieve accurate inspection information to answer the CQs, proving that 
OCQA-TI satisfies the ontology coverage criteria. 


Second, based on the obtained data, we conducted a task-based evaluation to assess the usability of the proposed 
OCQA-TI ontology to solve a particular task. In this case, we conducted a use case to check if the measured air 
permeability values satisfied the RE2020 constraint. A SPARQL was conducted to identify the location where the 
air permeability value did not fit the constraint to provide the site manager with useful information to support 
awareness of the site condition and take actions to solve issues that occurred to ensure the upcoming tasks. The 
SPARQL query and result appear in Table 3. The result shows that the air permeability value in Zone B was 1.2, 
exceeding the maximum value in RE2020. Thus, the site manager should consider reducing the air permeability 
in Zone B to ensure the following insulation installation work. In summary of the task-based evaluation, the 
OCQA-TI can be used to check the inspection with the compliance of constraints from the norms. 
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Table 2: Specified CQs and answers based on the task of the blower door test. 


Information retrieval Answer 


1. What are the characteristics being inspected by -AirPermeability _7832-3783-173 
inspection :7est001? 


2. What are related entities of an inspection? 


a. Where is the location of an inspection? -ZoneA 
b. Who is responsible for the inspection? Inspector XYZ 
c. What is the norm related to the inspection? :RE2020 
d. What equipment is required for an inspection? :NO80YU64K 
3. What are following/precedence inspections of an :Test002 
inspection? 
4. Which procedure is required for an inspection? :BlowerDoorTest 
5. When is the start and end date of the inspection? Start: :04072023-1 End: :04072023-2 


Table 3: SPARQL query and the result of air permeability constraint checking. 


SPARQL query 


SELECT ?test ?location ?value ?characteristic 
WHERE { 
?inspection a ocqa:Inspection . 
?inspection ocqa:hasSubInspection ?test . 
?test dicp:hasLocation ?location . 
?location :hasAirPermeability ?characteristic . 
?characteristic ocqa:hasAssignedCharacteristic Value ?AssignedCV . 
?AssignedCV :maxvalue ?maxvalue . 
?characteristic ocqa:hasActualCharacteristic Value ?ActualCV . 
?ActualCV :value ?value . 
FILTER (?value > ?maxvalue) 


Result illustrated in GraphDB 


test $ location Li value : characteristic $ 


8. CONCLUSION 


In conclusion, poor construction quality remains a significant challenge in the construction industry, leading to 
increased costs and avoidable damage. Therefore, this paper proposed an innovative solution by developing the 
OCQA-TI. This ontology extension was designed to tackle the complexities of inspection planning and evaluation 
in thermal insulation and enhance the overall quality assurance process. 


The research successfully extended the OCQA to incorporate thermal insulation, allowing for the description of 
insulation inspections and finishing trades. By bridging the gap between required and actual inspections on 
construction sites, the ontology provided proper knowledge, semantics, and terminology specific to thermal 
insulation, supporting inspectors and project teams in making informed decisions during an inspection. 
Additionally, the ontology facilitated gathering and integrating heterogeneous data from various software 
applications, promoting the traceability of information between systems. It also offered a practical solution for 
handling conflicting product guidelines, allowing references to guidelines instead of norms, thereby ensuring dual 
compliance with specific product requirements and regulatory norms. 


This ontology development is expected to significantly improve construction site quality by enabling proactive 
defect prevention and enhancing the efficiency of inspection planning. With comprehensive and project-specific 
inspection plans readily available, construction teams can reduce the impact of work by better adhering to the 
inspection plans and rectifying any issues promptly. However, while the OCQA-TI ontology presents a valuable 
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contribution, no ontology can be considered the definitive or best solution. As construction practices, regulations, 
and technologies evolve, continual improvement and updating of the ontology will be necessary to maintain its 
effectiveness. 


Admittedly, there are several limitations of this study which are summarized as follows. First, the proposed 
ontology-based approach of thermal insulation inspection planning relies on the construction domain knowledge 
such as norms, standards, and production codes, which may be different in different regions or countries. For 
example, in this research, we adopt RE2020 to model the constraints, which is the French regulation that may not 
be applied to other countries. Therefore, to improve the usage of OCQA-TI globally, further efforts are needed to 
collect more international domain knowledge into the ontology. However, it is difficult to achieve by a single 
research team. Thus, in the future, we are aiming to provide a shared knowledge platform based on the OCQA-TI 
to collect more comprehensive knowledge for inspection by interaction and collaboration of users in different 
countries. Second, ontology needs to be continuously updated and maintained corresponding to changes of the 
domain knowledge and accommodating new inspection procedures, equipment, and reference values. Third, the 
ontology has not been fully applied or utilized on construction sites. Therefore, future research may expand the 
ontology to cover additional construction trades, enabling a comprehensive and interconnected quality assurance 
framework for various construction activities. Moreover, investigating automated inspection planning for the 
OCQA-TI using rule languages or other advanced technologies could further enhance the ontology’s usability and 
efficiency. Additionally, there is the challenge of testing these technologies on a construction site in a real-world 
context, while also considering the socio-economic implications for construction workers and inspectors. 


Overall, the OCQA-TI ontology offers a knowledge-driven approach to thermal insulation inspection planning, 
promoting enhanced information retrieval and data integration while supporting quality assurance decisions. By 
enabling early defect detection and addressing implementation issues, the ontology contributes to improved energy 
efficiency, interior comfort, and reduced building maintenance costs. As the construction industry continues to 
evolve, adopting such knowledge-driven approaches is crucial for ensuring safer, more durable, and energy- 
efficient buildings in the future. 
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CONSISTENCY IN PRICE LIST TENDERING DOCUMENT 
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Polytechnic of Milan, Italy 


ABSTRACT: Effective cost estimation for tendering plays a critical role in the building construction process, 
enabling efficient investment management and ensuring successful execution of the construction phase. Traditional 
cost estimation procedure involves manual information processing to extract and match technical data from textual 
description construction resources. This activity requires practitioner deep experience and manual effort, often 
resulting in errors and, in the worst scenario, judicial disputes. 


In response to the increasing demand for structured information and automated processes, this study addresses 
the need for Public Administrations to achieve better control over the data contained in public tendering 
documents provided to practitioners. To fulfill this objective, a framework is proposed to automatically retrieve 
information from these documents, serving as a support tool to map items within the documents, highlight missing 
data, and critical semantic ambiguity. 


The designed framework aims to develop a tool for automatically identifying similarities between work items and 
their corresponding elementary resource items in Price List tendering documents. By leveraging the information 
retrieval NLP technique of cosine similarity through TF-IDF, a methodology was developed to support and 
facilitate practitioners’ activities. Finally, the framework was tested on four case studies extracted from Lombardy 
Regional Italian price list documents showing that the resulting support tool is able to automate the analysis 
process and efficiently reveal inconsistency. The model successfully extracted and correctly matched the 
elementary resource to the corresponding work query in 75% of the cases where the elementary resource was 
present in the list. Additionally, the model proved to be a valuable tool in helping practitioners identify missing 
resources. 


KEYWORDS: Automated cost estimation, Information retrieval, Text similarity, NLP, Tendering document, 
Public Administrations 


1. INTRODUCTION 


Cost estimation plays a pivotal role in effective decision-making within construction project management. 
Numerous studies underscore its significance (M.E.Sepasgozar et al., 2021). However, traditional construction 
cost estimation often involves multiple manual processes, with limited automation, resulting in time-consuming 
efforts and susceptibility to human errors (Akanbi & Zhang, 2021). Despite the increasing use of BIM approaches, 
information exchange in AEC industry is still mainly based on the production of paper-based documents. These 
documents are often written in natural language, conveying knowledge through unstructured or semi-structured 
data. Natural language is by nature unstructured, and it is therefore difficult to be digitally managed. Manually 
data extracting can lead to discrepancies and inaccuracies in information, thereby posing financial risks, project 
delays, potential failure, and in the worst case scenario to judicial disputes (Jafari et al., 2021a). These issues 
impact the effectiveness of the projects along with the credibility/reputation of the stakeholders. Moreover, the gap 
between traditional document-based (i.e., semi-structured and unstructured) and model-based information can lead 
to information loss and inconsistency. (Opitz et al., 2014). Thus, effective data management turns out to be essential 
to the overarching project strategy. 


Public administrations play a key role in the construction process, especially in the context of public procurement 
where they assume the role of contracting authorities. These entities encounter a large daily influx of data, much 
of which needs to be made accessible to external stakeholders. Among these diverse datasets, a significant portion 
consists of unstructured textual information. Consequently, there is a need for these administrations to enhance 
their data management capabilities. 


Effectively addressing these challenges necessitates the adoption of a methodology capable of efficiently handling 
and structuring the considerable volume of semi-structured and unstructured data for tasks such as cost and time 
estimation. Within this framework, data pre-processing emerges as a fundamental phase, acknowledged for its role 
as the most time-intensive aspect of text classification. 
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To address the aforementioned issues, and to meet the growing demand for public administrations and practitioners 
to convert textual information into digital formats, this research proposes a methodology to develop a procedure 
for checking information consistency within price list tendering. This is achieved through the application of 
Natural Language Processing (NLP) techniques, ensuring the coherence of information within the document. The 
methodology focuses on confirming the alignment between work descriptions and the employed elemental 
resources. The proposed research activity follows the prior study focused on automating the process of structuring 
data from textual documents (Gatto et al., 2023). Specifically, this research shifts the focus on responding to the 
public administration's need to assess the consistency of information within the regional price list before 
structuring and subsequently providing it to the user, by verifying the correspondence between the textual 
information related to the construction works and the textual information of the respective elementary resources 
involved. 


To minimize semantic ambiguity and enhance machine comprehension without human intervention, data in textual 
documents can be handled using NLP, which has demonstrated its efficacy in supporting human activity (Zabin et 
al., 2022). In this direction, (Tang et al., 2022a) developed NLP and rule-based algorithms to automate the 
information extraction from work descriptions in building construction. They integrated different algorithms such 
as Hidden Markov model and improved the accuracy by 89% compared to other common named entity recognition 
algorithms. However, despite the wide application of NLP in different construction fields, the application of these 
techniques in the pre-design phase is still a research gap in the literature (Locatelli et al., 2022). Data pre-processing 
of unstructured data is known as the most time-consuming phase of text classification in the whole process 
(Munkova et al., 2013). 


This research is organized as follows: The initial section, "State of the Art," presents the research background. 
Following that, the "Research Methodology" and "Framework Development" sections detail the study's approach 
and implementation. The subsequent segments, "Testing Framework" and "Results and Discussion," demonstrate 
the practical application of the framework and its evaluation. Ultimately, the "Conclusion" section encapsulates 
the key findings from the study. 


2. STATE OF THE ART 


Manual extraction of reporting requirements from extensive construction documents can lead to time and cost 
underestimations. In this direction, the application of NLP techniques has been increasingly adopted in the AEC 
sector to manage the information contained in documents ((Jafari et al., 2021b); (J. Zhang et al., 2020)). NLP is 
mainly applied in four scenarios of information extraction, document organization, expert systems, and automated 
compliance checking (Wu et al., 2022). 


Recently, NLP has been used in the construction industry to facilitate cost estimation through document 
management (Tang et al., 2022b). To automate extracting information from construction regulatory documents, a 
study has been developed by applying a semantic rule-based NLP approach for a text recognition algorithm based 
on semantic analysis (J. Zhang & El-Gohary, 2016). In a later study, an automated framework was developed using 
NLP and machine learning techniques to automatically recognize and prioritize important contract terms, enabling 
managers to quickly and fully understand contract agreements (Hassan & Le, 2020). Furthermore, a model that 
automatically identifies the most relevant pairs of provisions from various specifications using semantic text 
similarity was developed (Moon et al., 2021). This assists practitioners by reducing the effort to complete tasks 
that involve written documents, enhancing the objectivity of outcomes, and minimizing human errors. 


In the construction industry, dealing with inconsistent information, semi-structured and unstructured data in price 
list documents is an ongoing challenge. The causes of these issues often include human errors during data entry, 
outdated price lists, and issues with the software tools used for creating BIM models and price lists (Cha & Lee, 
2018). These issues can cause inaccurate cost estimates, budget overruns, delays in project timelines, and 
disagreements between project stakeholders. 


Concerning the inconsistency and ambiguity checking, a recent research activity has been performed using the 
support vector machine (SVM) supervised learning model methodology, leading to an automated method for 
detecting ambiguity in building requirements, which can then be reviewed and interpreted by domain experts to 
support the automated compliance checking process (Z. Zhang & Ma, 2023). In another work, the authors proposed 
an uncertain knowledge graph-based method to eliminate potential conflicts and acquire the ‘most likely scenarios’ 
by integrating multiple representations of building information (Xie et al., 2023). 
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Document classification is crucial in the process of digitizing and structuring information. Text data pre-processing 
serves as a foundational step for this process (Lee & Yi, 2017). Text classification as part of NLP, involves the 
automated categorization of text. This task can be accomplished using two main approaches: rule-based techniques 
and Machine Learning (ML) algorithms. 


The conversion of textual information into digital formats is necessary for the building tendering process. The 
digitalization process not only enhances the accessibility and usability of the information but also opens up new 
scenarios for data analysis and decision-making. Application of advanced technologies such as NLP and machine 
learning, can automate the structure of price lists and show similarities between products and their corresponding 
resource items. This can help practitioners improve the consistency of information in documents. 


2.1 Research gaps and challenges 


As the construction industry continues to embrace digital transformation, the application of NLP has emerged as 
a promising area of research. NLP has the potential to revolutionize various aspects of construction, from project 
management to BIM. However, the integration of NLP into the construction sector faces some challenges that are 
explained in this section to highlight future research interests in this field. 


(Ding et al., 2022) provided a review of the NLP-related research articles in the construction field and they pointed 
out related challenges such as data accessibility/monopoly to develop the intelligent agent and data diversity from 
various devices, such as text, images, sensors, and audio, presenting a challenge in developing comprehensive 
models with higher performance. Additionally, they mentioned that achieving full automation and high-level 
reasoning requires advanced extraction and understanding of models because NLP models can struggle with 
understanding the complex technical context in construction. 


Another challenge is achieving semantic interoperability between BIM and NLP in the construction industry. The 
use of ontology as a bridging tool is a potential solution to address this gap, yet this area remains underexplored 
(Locatelli et al., 2021). 


2.2 Tendering documents 


Tender documents serve as a communication tool between the project owner and potential contractors, outlining 
project specifications, execution conditions, and the rights and obligations of all parties. The clarity of these 
documents is important to avoid financial disputes. The type of tender documents depends on the procurement 
method and contract type, and typically include drawings, specifications, and bills of quantities (Cunningham, 
2015). 


The quality of tender documentation can significantly impact on the procedure, alongside other factors like contract 
content and tender management. Despite the challenges, contractors must carefully prepare their bids to increase 
their chances of securing the contract (Lesniak & Janowiec, 2020). 


Construction projects, seen as transient businesses, require careful project management, particularly during the 
tendering process. This process, which involves numerous variables and substantial resources, is influenced by 
factors such as the financial stability of contractors, offered price, delivery timeline, experience, environmental 
considerations, and personnel qualifications (Naji et al., 2022). 


The Public Italian Contracts Code, art.23 D.lgs 18 Aprile 2016, n.50, imposes on each Italian region to annually 
provide a price list that contracting authorities have to use for setting the project cost base for tenders. Therefore, 
each region provides practitioners with a price list containing work items and their respective cost (Sdino & 
Rosasco, 2021). The tool mainly stores data associated with construction activities, including their unit prices. This 
resource assists practitioners in generating estimated metric calculations. Additionally, to ensure more 
transparency in the composition of the price of construction works, the price list provides a catalog of elemental 
resources involved in the latter. Therefore, there must be a full correspondence between works and elemental 
resources, otherwise inconsistency arises. Considering the need for annual updates, the price list is subjected to 
periodical revisions, consisting in the unit prices update, the addition of new work and elementary resource items 
or removal of outdated entries. 


Information is conveyed by the tool in verbal form: sentences composed by words and syntax delivering 
knowledge. Since each item is written in natural language and because the document doesn’t follow a standard in 
providing information, a lack of homogeneity has been recorded between each item phrase structure and 
information typology transmitted. Public Administrations therefore are looking for tools to help them structure 
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high amount of data. 


3. RESEARCH METHODOLOGY 


The methodological approach employed in the presented study is explained in this section and summarized as 
depicted in Figure 1Figure 1 Methodology chapter. Firstly, the development of the study starts by listening to the 
needs of public administrations in the AEC sector, who have been engaged to comprehend the challenges they face 
in managing information during public tendering processes. Following the prior study focused on automating the 
process of structuring data from textual documents (Gatto et al., 2023), aneed emerged from public administrations 
to incorporate a preliminary step to assess the consistency of information within the regional price list before 
structuring and subsequently providing it to the user. The subsequent step was to explore the state of the art, aiming 
to delve into tools and methodologies applied for automating the management of unstructured data. This was 
followed by the development of a framework, leveraging information retrieval NLP techniques. Specifically 
textual similarity recognition based on cosine similarity using TF-IDF, was selected to build a tool designed to aid 
public administrations in establishing associations among essential resources within a construction project, thereby 
highlighting any gaps that may exist. This technique has proved to be valuable and effective within the domain of 
text similarity and in this study it is applied in the specific field of cost estimation, focusing on price list documents. 
Finally, the framework was tested in a practical case study to assess its effectiveness. 


Framework 
design and 
developement 


Framework 
testing 


Understanding 
industry needs 


Start 


Figure 1 Methodology chapter. 


4. FRAMEWORK DEVELOPMENT 


This section presents the design and development process of the framework, as synthesized in Figure 2. The 
objective of this phase is to develop a comprehensive procedure for checking information consistency within price 
list tendering documents using Natural Language Processing (NLP) techniques, specifically aimed at verifying the 
correspondence between the textual information related to the construction works and the textual information of 
the respective elementary resources involved. The primary goal of public administrations is to provide users with 
a tool that ensures utmost clarity, free from semantic ambiguities and inconsistencies during the crucial phase of 
estimating construction costs. 


Construction Bementary 
work list resource list 


Apilying cosine 
similarity 
technique 


Collecting the Data 
dataset Preprocessing 


Analysing 
output data 


Figure 2 Framework flowchart. 


By leveraging NLP technique, this framework allows the public administration to speed up the process of 
addressing elementary resource item to the work items where it is involved, thus helping with the detection of 
missing information and inconsistencies, ensuring the accuracy and reliability of how the data is presented to the 
user. The parameterization of information through NLP techniques empowers public administrations with greater 
control over the textual dataset, facilitating easier data manipulation and analysis. Furthermore, this process seeks 
to address the longstanding issue of ambiguity that often arises from cost item descriptions, ensuring a higher level 
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of accuracy and precision in subsequent analyses and decision-making processes. 
The developed framework mainly consists of four stages. 


The first step in the process involves collecting the dataset, which comprises both the list of completed works and 
the corresponding list of elementary resources involved. The work's textual descriptions typically explicitly state 
the elementary resources utilized in the activity, allowing practitioners to identify them accurately. Once the dataset 
is assembled, it undergoes a pre-processing phase to ensure the correct execution of the subsequent steps. 


Since the description of the elementary resources is expected to be contained in the description of the works, the 
cosine similarity technique using TF-IDF is chosen for the development of this framework. This measures the 
similarity between two vectors, A and B, by calculating their dot product and dividing it by the product of their 
magnitudes as: 


(JA||B| * cos (a))/(AI|B)) 


The resulting value ranges between 0 and 1, where 0 indicates no match (completely dissimilar vectors), and 1 
represents complete similarity (vectors pointing in the same direction). This metric is widely used in text analysis 
to measure document similarity. TF-IDF is primarily concerned with determining the importance of words within 
individual documents and in literature is commonly used for tasks like information retrieval and document ranking, 
and in text. Alternatively, the utilization of cosine similarity through Word2Vec is centered around capturing 
semantic meanings. This is achieved by representing each word as a dense vector within a continuous space. 
Notably, certain studies have suggested that for text similarity, TF-IDF often outperforms the other method, 
highlighting its effectiveness (Sitikhu et al., 2019). 


Once the dataset has been collected and the NLP technique to be used identified, the next step is to vectorize the 
list of elementary resources dataset and query them with the respective work descriptions, with the aim of deriving 
the elementary resource item associations. The cosine similarity technique allows to rank the list of elementary 
resources based on their similarity to the work descriptions, with the most similar resources being assigned with 
higher score similarity value. 


After obtaining a ranked list of elementary resources associated with each work description, building construction 
cost estimation practitioners were involved to validate the accuracy and effectiveness of the output results. They 
carefully reviewed the output and assessed whether the correctness of the output with a higher score rate. The 
expert validation process not only serves as a critical quality control measure but also provides valuable insights 
and feedback, enhancing the overall robustness and practical applicability of the framework. 


5. TESTING FRAMEWORK 


In this section, the testing process of the framework is presented, with the aim of assessing its capability to 
accurately determine the appropriate elementary resource for each work item. The evaluation was conducted using 
four sample case studies extracted from the Lombardy Region Price List document (January 2023 version). The 
objective was to verify the framework's ability to assign the correct elementary resource to four specific types of 
works: masonry clay block, masonry concrete block, tile floor, and thermal insulation work. 


useful = 

le -= rows seless record ii 'eprocessed 

vanr xixs fi cokane empty row: useless re pairing Preproce: wore Bt 
acquisition removal removal descriptions dataset 


selection esses, 
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elementary 
fesoruces list 


Figure 3 Preprocessing flowchart. 


To ensure effective vectorization and handling of relevant information, dataset pre-processing plays a crucial role. 
The steps undertaken during this phase are depicted in Figure 3. As explained later, the pre-processing is focused 
on isolating and extracting the pertinent textual data; however, it does not involve further cleaning, such as 
removing stop words or those that appear with high or limited frequency in the text. The objective is to retain the 
essential context and meaningful information while preparing the data for vectorization and subsequent analysis. 
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Data was acquired from a single spreadsheet. where the knowledge organization follows a semi-structured format 
as shown in Figure 3, consisting of seven columns in the following order: item ID code, textual description, unit 
of measurement, unit price, percentage of labor incidence, percentage of material incidence, and percentage of 
equipment incidence. 


CODICE DESCRIZIONE U.M. P.U. X Inc. M.O. X Inc. MAT X Inc. ATT 
10.06.100 MURATURE FACCIA A VISTA NaN NaN NaN NaN NaN 

NaN NaN NaN NaN NoN NaN NaN 
1C.06.100.0050 Muratura faccia a vista con mattoni pieni tipo NaN NaN NaN NaN NaN 
Nan NaN NaN NaN NaN NaN NaN 
1C.06.100.0050.a -coa mationi 25 x 12 x 5.5 cm, spessore 12 cm m $831 33,90 41,38 NaN 
Nan NaN NaN NaN NaN NaN NaN 
1€.06.100.0050.b - con mationi 25 x 5.5 x 5.5 cm (bastonetio me 88,99 3,52 40,52 NaN 
NaN NaN NaN NaN NaN NaN NaN 
1C.06.100.0100 Muratura faccia a vista con mattoni semipieni NaN NaN NaN NaN NaN 
Nan NaN NaN NaN NaN NaN NaN 


Figure 4 Raw dataset. 


The raw dataset is characterized by many rows with null records, therefore, after the acquisition of the Lombardy 
Regional Price List document, the first step performed for cleaning the dataset is the removal of empty rows. 
Subsequently, only the useful columns have been selected, containing the ID code information and textual 
description of items. 


Moreover, the hierarchy knowledge of information, such as chapters and sub-chapters, is provided by the price list 
tool by code length. The shorter the code, the higher is the hierarchical level of information; conversely, the longer 
the code, the deeper is the hierarchical level of information transmitted. In Figure 4, the first record characterized 
by the code "1C.06.100" represents the chapter referring to the exposed brick wall works, while the records with 
longer codes are the specific work activity within the masonry chapter. For the achievement of the objectives set 
within this paper, the data type to work with belongs to the last hierarchical level, where data are conveyed with 
the highest level of detail through textual items description. Work and elementary resource descriptions which a 
unit price is associated with are characterized by a code length higher than 13 digits. 


Since elementary resources could be described at both single code and parent-child code levels, the last pre- 
processing step involved pairing the child entries with their respective parent entries, ensuring the framework's 
accuracy in resource assessment, as shown in Figure 5, where the process of pairing items description is shown. 


By following these steps, we ensured that only relevant and properly organized data were used in the testing 
process, further validating the framework's effectiveness in elementary resource allocation for construction works. 


The following stage consists of the cosine similarity technique application. Scikit-learn library have been used for 


('MC.06.050.0025", ‘Exposed bricks 6 x 11 x 23 cm sandblasted") ,§[("MC.06.050.0025", ‘Exposed bricks € x 11 x 23 cm sandblasted’), 
("MC.06.050.0030", ‘Semisolic exposed br :*) ('MC.06.050.0030.8", ‘Sexisclid exposed bricks:- brick 25 x 10 x $5.5 cm'], 
("MC.06.050.0020.8", ‘- b 25 x1 ('MS.06.050.0030.b", ' 1- 1 


"MC.06.050.0030.b", ' 
06.050.0030.c", ' 


( ('MC.06.050.0030.c", ' 
(UHC. 

('MC.06.050.0030.d', ° 

(CMC. 

(MC. 


('MC.0€.050.0030.d", ' 
['NC.06.050.0030.e', 
('MC.06.050.0030.f", ‘Sexisolid exposed bricks:- double UNI 25 x 12 x 12 cm’) 


06.050.0030.8", *- bri 
06.050.0030.£", ‘= do 


t 
xxxxxx 


Figure 5 From left to right, the process of pairing child entries with their respective parent entries. 


this purpose importing in a Google Colab notebook the TfidfVectorizer class and using the function 
cosine_similarity. 


Descriptions from the elementary resource list are converted from textual to numerical representation through TF- 
IDF feature extraction action, obtaining a sparse matrix. This process effectively maps the vocabulary of the 
dataset's domain knowledge. As a result, each phrase in the dataset is transformed into a vector within the TF-IDF 
feature space, representing its unique characteristics in relation to the entire corpus of documents. 


Later, the same process is repeated for a single description query, which comes from the works list. This query is 
transformed into a TF-IDF feature vector using the TfidfVectorizer that was fitted on the product descriptions. 


Finally, the cosine_similarity function calculates the similarity between the query and all records (product 
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descriptions) in the product list. This provides a similarity score for each elementary resource within the list with 
respect to the query. The products characterized by the most similar description to the query are retrieved by sorting 
the similarity scores in descending order, from the highest to the lowest value. For this study, it was decided to 
recall only the first 5 products, leaving out the later ones. 


In the table below it is shown an example of the framework, by querying the list of products with the 
“1C.06.050.0100” work, whose description is “Semi solid masonry wall, 8 x 12 x 24 cm, with cement mortar, 
including the charge for the formation of shoulders, vaults, corners, pilasters, internal worktops”. The first 
elementary resource returned by the proposed methodology is the correct characterizing product of the enquired 
work. The overall proposed framework exploits cosine similarity through TF-IDF techniques for retrieving 
products in the Price List document. Those are ranked based on their similarity scores, allowing the user to identify 
the most relevant matches for the query. 


The framework is tested on four samples. Each sample represents a different domain (masonry clay blocks, 
masonry concrete blocks, tile floors, and thermal insulations) with unique terminologies and sentence structures. 
By testing the developed methodology on a diverse dataset, it is possible to assess its scalability on different 
knowledge subdomain. 


The last step of the framework requires the evaluation of the output, performed by practitioners, who assesses the 
correctness of the first output, characterized by the highest score rate. 


Table 1 Framework output, queried with “1C.06.050.0100” work. 


Score rate ID code description 

0.50 MC.06.050.0040.a Semi-solid bricks:- semisolid brick 8 x 12 x 24 cm 

0.43 MC.06.050.0040.b Semi-solid bricks:- semisolid brick 8 x 24 x 24 cm 

0.38 MC.06.050.0040.e Semi-solid bricks:- double UNI semi-solid brick 24 x 12 x 12 cm 

0.34 MC.06.050.0045.c Semi-solid bricks complying with UNI EN 771-1 and the Minimum Environmental 


Criteria set forth in the Decree of 23 June 2022 of the Ministry of Ecological 
Transition, for the construction of partitions or counterwalls; type - dimensions 
(length x width x height) in cm - perforation (%<) - thermal conductivity (A) 
according to UNI 1745 of dry brick - fire resistance with normal and fireproof 
plaster* - soundproofing power:- block with horizontal holes 30x4.5x15 cm - dB 39 


Solid bricks 25 x 12 x 5.5 cm complying with UNI EN 771-1 and the Minimum 
Environmental Criteria set forth in the Decree of June 23, 2022 of the Ministry of 

0.33 MC.06.050.0015 Ecological Transition, for the construction of load-bearing masonry according to 
NTC 2018, thermal conductivity (A) according to UNI 1745 of dry brick 0.431 
W/mK 


6. RESULTS AND DISCUSSION 


In this section, a preliminary phase of analysis and discussion of the analyzed dataset is presented. This approach 
allows us to have a more comprehensive view of the results obtained before delving into further discussions. 


6.1 Dataset preliminary analysis 


A preliminary analysis was performed in order to provide a better description and visualization of the four selected 
dataset. Table / collects some significant data on which the evaluations are performed. It provides the size of the 
tested sample, both for works and elementary resources, for the subdomain knowledge analyzed (Clay brick wall, 
concrete brick wall, tiles, and thermal insulation). Furthermore, it provides the description with major, minor, and 
average number of words per work and elemental resource. A delta is also given to highlight the differences 
between works and elemental resources. 
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As it is possible to see from the table below, a homogeneous number of 40 work items have been analyzed for 
each test campaign. The related number of elementary resources varies according to the type of knowledge 
subdomain, ranging from a minimum of 33 to a maximum of 64 items. 


Concerning the analysis of textual descriptions, the number of words in them was investigated. The clay Brick 
Wall campaign is characterized by 50.8 average words per work item. and 30.7 average words per elementary 
resource item, registering a delta of 20.1 words. Among the test campaigns, the Concrete Brick Wall campaign 
stood out with the longest descriptions, averaging 123.9 words per work item. In contrast, the Elementary 
Resources campaign displayed a mean word size of 59.9, exhibiting the largest word delta between the two. 


The Thermal Insulation test campaign also featured lengthy descriptions. In this case, the average word count for 
both elementary resources and jobs was quite similar. On the other hand, the Tile Floors knowledge subdomain 
had the shortest textual descriptions for both elementary works and resources; moreover, the average length of the 
descriptions for works and elementary resources nearly matched, resulting in an almost zero delta. 


Based on this overview, it is evident that there are differences among the various knowledge subdomains. 
Specifically, the Concrete Brick Wall subdomain requires a greater number of words to convey information. 
Additionally, the descriptions of works within this subdomain offer more extensive information compared to their 
corresponding elementary resources. Conversely, the Tile Floor subdomain necessitates fewer details in its 
descriptions, with both works and elementary resources providing relatively concise information. These 
differences highlight the varying informational needs and content richness across the different knowledge 
subdomains. 


Table 2 Preliminary dataset analysis. W: work items; ER: elementary resource items. 


Clay brick wall Concrete brick wall Tile floor Thermal insulation 
W ER A W ER A W ER A W ER A 
n° of items 40 33 7 40 50 -10 40 59 -19 40 64 -24 
Max length 105 89 16 140 121 19 82 94 -12 150 121 29 
Mean length 50.8 30.7 20.1 123.9 59.9 64 46.3 46.5 -0.2 85.6 98.1 12.5 
Min length 13 10 3 54 28 26 10 10 0 49 33 16 


6.2 Discussing framework result 


Table 4 in this paragraph presents the results obtained, by displaying in the first row the number of times the model 
successfully assigned the elementary resource to the work query. Conversely, in the second row, the table shows 
the number of times when the model was unable to assess the correct output. The last row provides the number of 
missing elementary resource items. According to the domain of knowledge, different outcomes were achieved. 


The poorest results were recorded for the Concrete Brick Wall and Clay Brick Wall domains. Also, as previously 
shown in table 3, these domains exhibited a substantial delta between the number of words between works and 
elementary resources, indicating that the works conveyed more information than the descriptions of elementary 
resources. For the Clay Brick Wall sample, 19 tests over 40 provided incorrect output, being 47.5% of the total 
tests; however, practitioners verified that in the 57.8% of incorrect outputs, the elementary resource is missing in 
the price list. Concerning the Concrete Brick Wall sample, 26 tests over 40 provided incorrect output. being 65% 
of the total tests. 


Furthermore, it was observed that in certain instances, standardizing works and elementary resources resulted in a 
shift from negative to positive outcomes. Table 3 shows the output of the work query “Load-bearing masonry made 
of hollow core brick blocks, thermo-acoustic, with cement mortar, including formation of vaults, pilasters, corners; 
with:- simple blocks 13 x 30 x 19 cm, thickness 13”. The model does not return the correct elementary resource 
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as the first output, but rather as the third ranked output. Just by equalizing the lexicon from “block” to “blocks”, 
like the other items are, the similarity score of the correct output turns from 0.37 to 0.44, with a higher score. 


Table 3 Framework output, queried with “1C.06.050.0300.b” work. 


Score rate ID code description 

0.39 MC.06.100.0010.b Thermal insulation blocks, 45% drilling:- interlocking blocks, 30 x 25 x 19 cm 
0.39 MC.06.100.0010.a Thermal insulation blocks, 45% drilling:- interlocking blocks, 25 x 30 x 19 cm 
0.37 MC.06.100.0010.c Thermal insulation blocks, 45% drilling:- simple block, 13 x 30 x 19 cm 


Conversely, Tile Floor and Thermal Insulation domains exhibited a larger number of positive outcomes, registering 
respectively 35 and 37 correct output tests out of 40, 87.5% and 92.5% respectively. In these cases, unlike previous 
tests, the descriptions of works and elementary resources had a smaller word delta. For Thermal Insulation, 
practitioners verified that in 67% of the negative outcomes, the correct data was indeed missing from the 
elementary resources list. 


Table 4 Testing framework results. 


Clay brick wall Concrete brick wall Tile floor Thermal insulation 
correct 21 {7 35 37 
incorrect 19 26 5 3 
Missing ER Il 0 0 2 


7. CONCLUSION 


The research presented in this study contributes to the empowerment of public administrations in effectively 
managing data from Price List Documents, aligning with the growing necessity to transition textual information 
into structured and machine-readable formats. Building upon the investigation conducted in the research titled 
(Gatto et al., 2023), this study introduces a methodology capable of extracting elementary resource information 
from a price list document and linking it to the corresponding construction work using cosine similarity NLP 
techniques. 


The model successfully extracted and correctly matched the elementary resource to the corresponding work query 
in 75% of the cases where the elementary resource was present in the list. Additionally, the model proved to be a 
valuable tool in supporting practitioners in identifying missing resources. A limitation associated with this 
approach is its reliance on retrieving explicitly mentioned information from the text. Indeed, it cannot derive 
implicit information. 


The study also highlighted that due to the lack of standardization in conveying information, the machine 
encountered ambiguity in interpreting the text. It emphasized the importance of adopting a more standardized 
approach to delivering information, making it more understandable not only to machines but also to humans. 


It is important to note that the developed framework does not replace human activity but rather acts as a supporting 
tool. Human verification and validation of the model's outputs are essential. 


Given the success of this framework, future developments aim to extend its application to broader contexts, with 
a focus on extracting cost information from various textual documents, including technical specifications and price 
list documents, and linking them to verify the consistency of cost information with the data contained in BIM 
models. 
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ENHANCING INTERACTIONS IN AUGMENTED REALITY FOR 
CONSTRUCTION SITES: INTRODUCING THE ARCHI ONTOLOGY 


Karim Farghaly, Khalid Amin, Grant Mills & Duncan Wilson 
The Bartlett School of Sustainable Construction, University College London 


ABSTRACT: Augmented reality (AR) systems offer new possibilities for enhancing how people interact with 
information and their environment in the construction sector. However, traditional software-driven approaches to 
AR system design have limitations in creating intuitive user experiences. This research presents a new user-centric 
framework and ontology for BIM-AR system development focused on human needs and perspectives. The BIM-AR 
Framework consists of a 5-step circular hybrid process with the user at the center. To enable knowledge sharing, 
the Augmented Reality Computer-Human Interaction (ARCHI) ontology was developed using Protégé based on 
established design principles. Initial validation indicates the framework's potential for improved AR system design, 
but further expert review and case studies are needed. The ontology also requires additional refinement and linkage 
to open data. This pioneering research lays the groundwork for next-generation AR systems that emphasize 
usability by taking a human-focused approach. With rigorous validation and evolution, the framework and 
ontology could transform AR technology development to create more purpose-driven and adopted solutions. This 
research represents a paradigm shift to user-centric AR system design that has significant potential to improve 
how augmented reality enhances construction project management. 


KEYWORDS: Augmented Reality, Building Information Modeling, BIM, Linked Data, Ontology, Human 
Computer Interaction. 


1. INTRODUCTION 


Augmented reality (AR) refers to technology that overlays digital information and objects onto the real-world 
environment in real-time (Azuma et al., 2001). This is achieved by supplementing the user's view with computer- 
generated input such as text, images, video, audio, and GPS data. In recent years, AR has emerged as a 
transformative technology with a diverse range of applications across industries like healthcare, education, 
manufacturing, and construction. Within the construction industry, AR is being explored as a means to enhance 
on-site work processes and information visualization. By overlaying 3D models, assembly instructions, or other 
data directly onto physical construction sites, AR enables workers to intuitively interact with digital information 
in context (Rankohi & Waugh, 2013). Specific applications include visualizing building designs and underground 
infrastructure, annotating issues for repair, remotely guiding workers through assembly tasks, and detecting risks 
or errors in construction (Behzadan & Kamat, 2009). AR and Building Information Modelling (BIM) can reduce 
workspace clutter, improve information communication, and integrate digital tools directly into the work 
environment (Um et al., 2023). 


However, there are several challenges to widespread AR adoption in construction. Current AR solutions are often 
provider-specific proprietary platforms that lack interoperability (Um et al., 2023; X. Wang et al., 2013). This 
makes integrating AR into existing construction workflows difficult, as data may not transfer seamlessly between 
different vendor tools. Additionally, much AR research has focused on novel visualization techniques rather than 
task-based, user-centric design (Amin, Mills, & Wilson, 2023; Behzadan & Kamat, 2009). As a result, usability 
and practical utility for on-site workers requires further improvement. To drive end-user acceptance, AR solutions 
need to tie tightly to actual construction tasks and processes, with UX design centered around user needs (Rankohi 
& Waugh, 2013). Realizing AR's full potential in construction will require developing flexible and standardized 
AR platforms that can be tailored to diverse use cases. Rather than one-size-fits-all vendor products, open AR 
ecosystems are needed where components can be mixed and matched (K. Wang et al., 2023; X. Wang et al., 2013). 
Tying these tools directly to construction workflows and end-user requirements will be key. With improved 
integration and usability, AR can transition from isolated proofs-of-concept to transformative mainstream 
applications in construction. 


2. BACKGROUND 


In recent years, AR has emerged as a potentially transformative technology for the architecture, engineering, and 
construction (AEC) industry. By superimposing digital models, data, and instructions directly onto physical 


Referee List (DOI: 10.36253/fup_referee_list) 
FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup_best_practice) 


Karim Farghaly, Khalid Amin, Grant Mills, Duncan Wilson, Enhancing Interactions in Augmented Reality for Construction Sites: Introducing the Archi 
Ontology, pp. 848-855, © 2023 Author(s), CC BY NC 4.0, DOI 10.36253/979-12-215-0289-3.84 


construction sites and assets, AR enables more intuitive visualization and interaction with information in context. 
Researchers have explored AR applications across the construction lifecycle, including design visualization, 
construction planning, progress monitoring, quality inspection, maintenance, and safety training (K. Wang et al., 
2023). One major area of research has been integrating AR with Building Information Modeling (BIM) to extend 
the utility of virtual BIM models to physical construction settings. BIM refers to digital 3D models of buildings 
containing rich parametric data on components and systems. Linking geo-located BIM models with AR allows 
contextually relevant on-site visualization and interaction with model data (Amin, Mills, & Wilson, 2023). This 
has been posited to improve communication, decision-making, work planning, quality control, and collaboration 
between on-site and off-site teams during construction and operations (Elshafey et al., 2020). 


However, studies note that widespread field adoption of BIM-AR remains limited, especially in developing 
countries, due to technical and organizational challenges (Sidani et al., 2021). Major technical barriers include 
issues with accurate and stable registration of virtual content with the physical environment. This is impacted by 
factors like lighting, network connectivity, and occlusion (X. Wang et al., 2013). Developing flexible AR platforms 
that can leverage different positioning techniques based on context has been identified as important. Organizational 
challenges also exist around integrating AR into construction workflows and aligning it with user requirements 
(Amin, Mills, Wilson, et al., 2023). In particular, research gaps remain around understanding user needs and 
perspectives for on-site AR applications. As Amin, Mills & Wilson (2023) discuss, much AR research has focused 
on novel visualization techniques rather than task-based, user-centric design. However, usability and practical 
utility requires aligning AR tightly with actual construction tasks and end-user workflows. Wang & Dunston (2006) 
and K. Wang et al. (2023) similarly argue there has been insufficient investigation of user-centered factors that 
influence the effectiveness of AR for on-site construction tasks. These include aspects like UI design, functionality, 
and ergonomics based on workers' processes and needs. 


Overall, studies emphasize that realizing the potential of BIM-AR in construction requires moving from proofs- 
of-concept to solutions tailored to end-users’ requirements and field workflows. This entails research on AR 
applications within the context of specific construction tasks, roles, and information needs. A user-driven approach 
can help identify high-impact areas where AR adds value for field personnel and integrate AR seamlessly into 
existing construction practices and project delivery processes. Bridging these research gaps around user-centered 
design and task-based workflows will be key to driving user acceptance and widespread adoption of BIM-AR on 
construction projects. In the following section we will discuss the current practice of the implementation of BIM- 
AR solutions. 


3. BIM-AR CURRENT PRACTICE 


The typical process of developing and implementing BIM-AR solutions tends to follow a linear path (Figure 1), 
rather than a circular user-centric approach. Researchers/Project Managers often identify a potential construction 
application area for BIM-AR visualization, such as design review or progress tracking, based on technical 
feasibility rather than validated user needs (Amin, Mills, & Wilson, 2023). Developers then select AR hardware 
and software components to prototype, focusing on demonstrating novel visualization capabilities more than 
usability (X. Wang & Dunston, 2006). A pilot study is conducted to test the BIM-AR prototype in a lab or limited 
field setting, with evaluation criteria tending to be technical performance metrics rather than workflow integration 
or user-centered design (Sepasgozar et al., 2016). If feasible, the prototype may be deployed in a real construction 
project to showcase a “proof of concept’, but these deployments often function as stand-alone tools disconnected 
from broader workflows and BIM processes. User feedback is collected informally, if at all, and BIM-AR solutions 
are not co-designed with end users or iteratively refined based on their input and task needs. Outcomes focus on 
the technical aspects and visualization capabilities, rather than productivity, quality, or other construction industry 
benefits. Consequently, BIM-AR prototypes frequently stall at the proof-of-concept stage without translation into 
commercial solutions or best practices (K. Wang et al., 2023). In summary, the current linear BIM-AR development 
process does not start with identifying user requirements or aligning systems tightly to construction workflows. 
Collaboration with industry stakeholders occurs late, if at all, which limits the real-world utility and adoption of 
BIM-AR innovations. A more circular, participatory design approach is needed, where end user perspectives drive 
the development and evaluation of BIM-AR solutions for construction. 
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Figure 1: The current practice of BIM-AR solutions and research 


4. PLUG&PLAY BIM-AR SOLUTION 


A more circular approach for the effective implementation of BIM-AR solutions in the construction industry is 
proposed. As mentioned before, the current linear process of BIM-AR development has limitations, as it does not 
sufficiently incorporate end user requirements or align solutions with real-world workflows. To address this, we 
suggest a participatory, iterative process, Augmented Reality Computer-Human Interaction (ARCHI), that centers 
end user perspectives for (Figure 2): 


Cultivate Use Case - The starting point should be identifying a practical use case cultivated by end users. 
This includes details on the application area, project phase, location, and key stakeholders who would 
benefit from the BIM-AR implementation. Example use cases could be design coordination, construction 
planning, or facilities maintenance. 

Identify Requirements - With a target use case, multidisciplinary workshops with stakeholders are held 
to determine functional and information requirements. User requirements capture necessary features and 
workflows to efficiently accomplish tasks using BIM-AR. This entails understanding objectives, 
processes, pain points, and needs. Information requirements outline graphical and non-graphical data 
inputs from integrated systems like BIM to achieve the user requirements. 

Map to Key Functions - The user and information requirements are then mapped to key BIM-AR 
functions needed to fulfill the use case. As Amin, Mills & Wilson (2023) proposed, these functions include 
positioning, visualization, interaction, collaboration, automation, and integration. Not all functions are 
necessary; the focus is on those critical for the specific use. 

Select Solution - With key functions defined, the project team surveys potential AR devices, software 
platforms, and components to meet the requirements. Table 1 provides an overview of current BIM-AR 
solutions and capabilities. The goal is finding flexible tools to fulfill the required key functions. 
Implement with Users - BIM-AR experts collaborate with end users to implement the selected solution 
in the construction project context. Workshop sessions ensure it aligns properly with real-world 
workflows while addressing information needs. Agile development principles can help adapt the system 
based on user feedback. 

Assess Against Use case and Requirements - Once deployed, structured assessments evaluate the BIM- 
AR solution against the original functional and information requirements. Metrics quantify performance, 
usability, and impact on productivity, quality, safety, etc. User surveys also provide qualitative feedback 
on enhancements. 


By applying this circular approach, BIM-AR solutions are driven by end user and project requirements rather than 
technical novelty. The focus is on integrating AR seamlessly into existing construction practices to solve real 
problems. Continuous assessment and improvement further refine the system over time and across projects. With 
BIM-AR tools tightly aligned to use cases and stakeholder needs, user adoption and benefits can expand markedly. 
This practical, participatory implementation process is key to unlocking the true potential of BIM-AR in 
construction. 
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Figure 2: ARCHI proposed framework 


Table 1: The existing solutions of BIM-AR (available in the construction market in 2023). 


Solution Hardware Software Type User Interface 
HoloLens-Kognitiv Spark HoloLens Kognitiv Spark Headset Gestural-based 
Varjo XR-3 Varjo XR-3 - Headset Gestural-based 
ATOM-XYZ Cloud Platform ATOM XYZ Cloud Platform Headset Tangible User Interface 
iPad Pro-Gamma AR iPad Pro Gamma AR Tablet Touch-based 
iPad Pro-GenieVision iPad Pro GenieVision Tablet Touch-based 
iPad Pro-Augmentecture iPad Pro Augmentecture Tablet Touch-based 
iPad Pro-ARKI iPad Pro ARKI Tablet Touch-based 
iPhone -Gamma AR iPhone Gamma AR Smart Phone Touch-based 
iPhone -GenieVision iPhone GenieVision Smart Phone Touch-based 
iPhone -Augmentecture iPhone Augmentecture Smart Phone Touch-based 
iPhone -ARKI iPhone ARKI Smart Phone Touch-based 


5. ARCHI ONTOLOGY 


To achieve the proposed framework, it is crucial to identify and capture all the key information needed for effective 
implementation. In this section, we introduce the ARCHI ontology which identifies all the important aspects to be 
captured. For developing a domain or upper ontology, it is essential to follow a set of defined recommendations 
and ordered steps (Farghaly et al., 2023). The process of the development of an ontology consists of seven main 
steps (Noy & McGuinness, 2001). The first step is to define the covered domain and the scope. As mentioned 
before, this research concentrates on the aspects related to BIM-AR solution implementation and presents the 
different tasks needed for that. The second step is to consider reusing existing ontologies. Several classifications 
and taxonomies have been taken in consideration as ontologies in this research such as PVICAT (Amin, Mills, & 
Wilson, 2023) for the key functions, Human Computer Interaction (HCD ontology (Costa et al., 2022) for the user 
and solution classes. The third step is to enumerate important terms in the ontology. In this step, terms are extracted 
to form a list of concepts (classes, relationships, and slots) from the data schema regardless of any overlap between 
the concepts they represent. The names of the selected terms have to follow a specific strategy as specified in 
define resources-naming strategy task. In this stage, all the classes and their related instances are identified. The 
fourth step is to define the classes and develop the class hierarchy. Several approaches can be used for developing 
a class hierarchy: namely, top-down, bottom-up and combination. Most of the ontologies are developed based on 
the top-down approach, which starts from an abstraction of a domain and continues to a concrete level. However, 
it has been argued that the bottom-up approach is more effective as domain modeling is based on raw and evidential 
data instead of theoretical conceptualization. In this research, the top-down approach is used for the reusable 
ontologies and concepts, while the bottom-up approach is selected for the development of the new ontologies. For 
example, the PVICAT was utilized to identify the classes of the key function classes. The researchers used that to 
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develop instances for each class (Table 2). The instances were identified through the engagement of the researchers 
in two projects where BIM-AR are implemented. The fifth step is to define the properties of classes (slots); while 
the sixth step is to define the facets of the slots. The values of slots are described in different facets such as: value 
type, allowed values, cardinality, and other facet features. The value type facet can be described in different value 
types, such as string, number, Boolean and enumerated. The allowed value facets define the range of slot, and the 
cardinality facets define how many values the slot can have. In this research, the string value type is used for 
defining most of the slots for the classes’ properties. Finally, the seventh step is to create the instance of classes in 
the hierarchy. The last three steps are ongoing part of this research. Several interviews and workshops will be 
conducted with end users to identify the instances especially for both the use case and requirement ontology. 


The ARCHI ontology was initially modelled collaboratively using diagram.net, enabling researchers to visualize 
and connect classes and relationships. The resulting UML diagram was exported and converted to an OWL 
ontology using the Chowlk tool. This OWL file was then imported into Protégé for further refinement. Protégé 
provides a platform to construct domain models and knowledge-based systems by enabling the creation of classes, 
properties, and individuals. To develop a robust ontology of knowledge for the BIM-AR framework, certain design 
principles and best practices were followed. As highlighted by Hlomani and Stacey (2014), the ontology requires 
precise conceptualization of the domain knowledge through iterative refinement of definitions. Furthermore, 
Gruber's (1995) criteria of clarity, coherence, extendibility, minimal encoding bias, and minimal ontological 
commitment were adhered to ensure a well-founded ontology. By leveraging Protégé and adhering to established 
guidelines, the ARCHI ontology codifies the concepts and semantics required to represent the knowledge and 
relationships underlying the BIM-AR framework. 


The ARCHI ontology consists of 5 key classes that characterize the problem space from different perspectives: 
Use Case, Requirements, Key Functions, Solutions, and Users (Figure 3). The Use Case class captures details 
about the context and goals of implementing AR into a BIM workflow. This includes the phase of integration such 
as design, construction planning, active construction, handover, or operations. It also covers specific applications 
and objectives, such as visualizing design models on-site, evaluating construction progress, or providing digital 
overlays for facility maintenance. Additionally, it describes the physical environment where AR will be utilized, 
for example a construction site or design office. The Requirements class contains the functional and non-functional 
needs that emerge based on parameters defined in the Use Case. For instance, a construction site application may 
require ruggedized hardware to withstand harsh conditions. Or an operations use case may require integration with 
existing facility management software platforms. Also, it covers the information requirements related to the 
information needed for the 3D models provided by the BIM systems. Based on each use case, we can define a 
Model View Definitions (MVD). Requirements provide a link between goals and necessary capabilities. The Key 
Functions class draws from the taxonomy of augmented reality capabilities synthesized by Amin et al (2023). It 
contains main categories of AR functionality. Each class also has associated instances as shown in Table 2. This 
provides a standardized vocabulary to describe AR features. The Solutions class characterizes the software, 
hardware, and other technological components of existing AR platforms. This includes parameters like software 
packages and versions, types of display hardware, interface modalities, tracking methods, input devices, and 
capabilities for data output or export. Lastly, the User class models the human users of the AR system. Both 
solution and user classes leverage existing ontologies related to human-computer interaction to fully define user- 
solution side factors. These 5 classes provide a structured foundation for evaluating and selecting optimal AR 
solutions. Use Cases define goals, Requirements outline needed capabilities, Key Functions provide a vocabulary 
of AR features, Solutions characterize technologies, and Users represent the human perspective. By mapping Use 
Cases to Requirements and Key Functions, then matching those to candidate Solutions while considering Users, 
the ontology enables principled assessment of how well a given AR platform suits a particular BIM use case need. 
This facilitates both targeted selection of existing tools and identification of areas requiring new solutions. 


In summary, ontology-based modeling of the BIM-AR solution space can enable richer representations of end user 
perspectives, workflows, requirements, and project contexts. This knowledge base can then drive the circular 
participatory framework by connecting BIM-AR capabilities directly to construction practices and stakeholder 
needs. Additional research is underway on ARCHI's formal ontology development and its applications for guiding 
successful BIM-AR adoption. 
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Figure 3: The main classes and relationships of ARCHI ontology. 


Table 2: Key function classes and their instances and mapped solution capabilities 


Key Function Class Key Function Instances Capabilities for the Key Function Instances 
Positioning Marker-based Scan Marker 

Positioning Natural Feature-based Scan Natural Feature 

Positioning Object-based Scan Object 

Positioning Manual Mapping Map Coordinates 

Interaction Modify Move, Rotate, Resize, Delete 

Interaction Retrieve Information Read, Download 

Interaction Store Information Capture Image, Capture Video 

Interaction Add Information Comment, Markup 

Visualisation Digital-Digital Inspection Identify Clash, Identify Defect 

Visualisation Visibility Customization Show, Change Color, Change Appearance 
Collaboration Issue communication Upload Image, Upload Video, Stream 
Automation Visual Inspection Class detection, defect detection 

Automation Report Generation Progress report, clash report, defect report 
Integration Production and Programme control Real-time integration with other systems 
Integration Presentation of external datasets Real-time integration with weather and others 
Integration Improve/Extend existing function API capabilities to extend functions 
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6. CONCLUSIONS 


The BIM-AR Framework and ontology presented in this research offers a novel user-centric approach for 
designing and deploying augmented reality systems. Rather than taking a software-driven approach focused on 
tools like Unity, this framework emphasizes human-centered design with a focus on AR content. The core of this 
approach is a 5-step circular hybrid process that continuously evolves based on user needs and perspectives. To 
facilitate the sharing of information between process stages, the ARCHI ontology was developed to capture key 
data points and relationships. This human-focused approach represents a paradigm shift from traditional AR design 
methodologies. By putting the user at the center and iterating based on their requirements, the framework enables 
the development of more intuitive and purpose-driven AR applications. The ARCHI ontology also plays a key role 
by codifying knowledge to prevent information loss across the design lifecycle. Overall, the framework aims to 
create a more seamless AR experience by enhancing the symbiotic relationship between user and technology. 


While initial expert review and case studies demonstrate the potential of this BIM-AR approach, further validation 
is required. Future work should concentrate on gathering additional use cases across different domains to refine 
the framework. More robust testing and evaluation of the ontology is also needed to ensure it adequately captures 
the necessary design knowledge. Extending the current ontologies with linked open data could also strengthen the 
knowledge-sharing capabilities. With further development and validation, this human-centric methodology could 
provide a new paradigm for AR system design that leads to more adopted and usable AR solutions. 


This research presents a promising user-focused approach to AR design, moving away from software-centric 
methodologies. The BIM-AR Framework and ARCHI ontology provide an integrated solution to put human needs 
at the forefront. While more work is required, this pioneer research lays the foundations for next-generation AR 
systems that emphasize the human perspective over tools. With rigorous validation and evolution, this framework 
can enable AR developers to create more intuitive and purpose-driven applications that deliver value in various 
real-world contexts. The user-centric future envisioned by this research has the potential to transform augmented 
reality technology and cement its place as an integral part of how people interact with information and their 
environment. 
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A REVIEW OF COMPUTER _ VISION-BASED PROGRESS 
MONITORING FOR EFFECTIVE DECISION MAKING 
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ABSTRACT: Construction Progress Monitoring (CPM) is a significant aspect of project management aimed to 
align planned design with the actual construction on site, the process ensures that the project is well within the 
control of the stakeholders involved and ensures the project is completed complying with the construction 
documents, on time, and within budget. Despite how central progress monitoring is to attaining project success 
and advances in technology, the progress monitoring is majorly implemented manually, which requires manual 
retrieving and processing of site data to compare with the planned design. This manual process is both time- 
consuming and prone to errors. Automating the task of progress monitoring involving real-time data acquisition 
and timely information retrieval can assist the project managers for effective decision making to the successful 
delivery of the project. Thus, the objective of this research was to assess the impact of computer vision (CV) — 
based progress monitoring as a driver for effective decision-making in project management. A qualitative 
methodology was implemented for this research using Preferred Reporting Items for Systematic Reviews and Meta- 
Analyses (PRISMA) to review and analyze studies on the application of computer vision (CV). The study reviews 
studies of CV based CPM process, highlighting its benefits against the traditional method of progress and the 
limitation to its adoption. Research findings from this paper provide an increased understanding and have a 
broader scope on the application of computer vision-based progress monitoring. 


KEYWORDS: Computer Vision, Construction progress monitoring, Decision-making, Project management 


1. INTRODUCTION 


Progress monitoring involves the processes required in tracking, evaluating and organizing the performance of a 
project, and identifying areas where modification needs to be implemented (PMOK, 2017). In the development 
phase of construction projects, site activities are tracked by the project manager using progress monitoring methods 
(Qureshi et al., 2022). Progress monitoring of a construction project is essential to the successful delivery of the 
project, this is because it entails recognizing the disparities between the planned design and the ongoing 
construction. As most tasks are interdependent, frequent inspections assist managers to detect anomalies early, 
avoid potential delays, and decide when to take remedial action (Reja et al., 2022). The progress monitoring phase 
is regarded as a complex task, it requires efficiency as it provides the essential inputs to the managers on site for 
prompt and informed decisions. This process, when done effectively helps to prevent cost and schedule overruns 
and improve the retrieval, management and processing of site data (Kopsida et al., 2015). According to Hanet et 
al. (2016), the limitations in manual and other conventional data acquisition procedures in progress monitoring 
cause more than 53% of construction projects to fall behind schedule and more than 66% of them to fall short 
financially. 


The traditional method of progress monitoring of construction projects involves manual retrieving of data, 
information processing, documentation, and reporting on the project status. However, this method is time- 
consuming, information obtained are prone to human errors, and often report obsolete information which impedes 
effective decision-making from stakeholders (Rehman et al., 2022). To improve this, the process can be made 
effective through automation. Technologies exist for the automation of progress monitoring; with focus on 
retrieving data from the site, some of which include unmanned aerial vehicle (UAV), geographic information 
system (GIS), virtual reality (VR), augmented reality (AR), radio frequency identification (RFID), and global 
positioning systems (GPS). However, computer vision technology can be consolidated with these technologies to 
be implemented for progress monitoring. Computer vision (CV) is similar to the human vision, but utilizes 
machine learning algorithms or deep learning models in analyzing, predicting and making useful interpretation 
from data inputs which could be images or videos (Paneru & Jeelani, 2021). 


For the automation of construction progress monitoring, a noticeable amount of research has been carried out. An 
overview study conducted by Ekanayake et al., (2021) on the application of computer vision-based interior 
construction progress monitoring. The study categorized the challenges that hinder the successful implementation 
of CV based interior construction project monitoring (CPM) into indoor objects, lighting condition and movements 
of the camera used. However, the study mostly focused on challenges for interior use. McCabe et al., (2017) also 
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SECTION C - Al, DATA SCIENCE AND ANALYTICS 


investigated on indoor CV-based CPM, their study identified the challenges encountered using UAV’s. for 
automated data retrieval. A related study by Kopsida et al., (2015) categorized the different stages involved in the 
automation process in terms of technology used and assessed, time efficiency, accuracy, cost and mobility. Most 
of the studies highlighted the need to overcome challenges for the successful implementation of a CV-based CPM. 
The objective of this study is to assess the impact of CV— based CPM as a driver for effective decision-making in 
project management. Investigating how this technology can improve decision making, by revealing its current 
level of adoption identifying the benefits, and the limitation involved to its application. 


2. METHODOLOGY/APPROACH 


For this study, a systematic review was conducted. This type of review involves identifying all relevant literature 
that is pertinent to the review question, critically evaluating identified literature and summarizing the findings 
(Gough et al,.2012). It helps to answer an important question or identify areas of importance relevant to the 
research question (Harris et al., 2014). The review begins by posing a research question, identifying relevant 
studies, critically evaluation of the studies, data collection, analyzing and structuring of the data, summarizing the 
evidence and reporting findings from the study (Khan et al., 2003). While the systematic review approach has been 
predominately used to conduct research in the medical field (Munn et al., 2018). It has been used several times in 
the field of construction management; for a review of sustainable construction management (Araújo et al., 2020), 
a review of the concept of buildability as it relates to construction management (Osuizugbo et al., 2022), and a 
review of the inter-relationship building information modeling (BIM) and safety in construction (Martinez-Aires 
et al., 2018). This is because the output from this type of review is usually comprehensive and exhaustive requiring 
an explicit methodology and helps present output in a structured sequence (Shamseer et al., 2015) 


This systematic review was conducted using the guidelines of the Preferred Reporting Items for Systematic 
Reviews and Meta-Analyses (PRISMA) which include four stages: Identification, Screening, Eligibility, and 
Includes (as shown in Figure 1). The databases sources used were Scopus, Web of Science (WOS) and Civil 
Engineering Database from the American Society of Civil Engineering (ASCE). Google Scholar was also used for 
the search of relevant studies. Keywords used in the databases for search include, “Computer vision” and 
“construction progress monitoring,” as well as “Computer vision-based construction progress monitoring.” For 
relevant extant literature, the search range was from the year 2005 upward. Duplicate files, and records having a 
different language from English that could not be translated were also excluded. Also, some papers had a 
methodological approach that did not align with the objective of this study, such papers were screened out. After 
screening and eligibility criteria the total number of papers evaluated and appraised for this study were 47. 


Records Identified from: 


Scopus Database (n=115) Records Identified from: 


Identification WOS (n=50) 


CEDB (ASCE) (n=70) 
Records after duplicates were removed (n=192) 
Records Screened out 
Different Language (n=16) 
= > 
Records Screened (n=192) Unpublished papers (n=23) 


Not directly relate to the objective (n=81 


Google Scholar (n=42) 


Screening 


Articles Reviewed for Articles excluded based on methodology, 
Eligibility focus on traditional method of data 
(n=72)1 analysis (n=25) 


Studies included in this study for qualitative review (n=47) 


Fig. 1: Flowchart showing the systematic review process using PRISMA. 
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F THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


3. ANALYSIS AND RESULTS 
3.1 Description of subprocesses of CV-based CPM process 


In recent times, the application of CV-based CPM in project management has gained traction as the advantages it 
possesses has been observed to aid stakeholders for an effective decision-making process in the lifetime of a project 
(Braun et al., 2020). Different studies categorized the sub processes involved in CV-based CPM. The categories 
include, data acquisition, information retrieval, progress estimation, and output visualization (Kopsida & Vela, 
2015; Rehman et al., 2022). Data acquisition and 3D reconstruction, as-planned & as-built modelling and progress 
monitoring (Reja et al., 2022). This review categorizes the process into three data acquisition, information retrieval 
and progress monitoring and visualization as shown in Figure 2. Each of the subprocess is summarized based on 
review of extant literature. The sequence is such that the data obtained automatically is analyzed to retrieve 
germane information such that there can be a systematic comparison between the as-planned and the as-built 
structure, and the disparities are made known in a comprehensible pattern. This section reviews the subprocesses 
associated with CV-based CPM. 


Analysis & Information 


Data Acquisition Progress Estimation & 


>_>) Retrieval p— Visualization 
- Aerial Systems -Deep Learning -BIM modelling 
- Fixed Systems -Machine Learning -Augmented reality/ 
- Handheld Systems -Photogrammetry Virtual reality 


Fig. 2: Subprocesses of the CV based CPM. 
3.2 Data Acquisition 


Data format for automated progress monitoring include but not limited to two-dimensional (2D) or three- 
dimensional (3D) images, from cameras and depth cameras respectively (Omar & Nehdi, 2016). Videos obtained 
from videos cameras which could be fixed or mobile with the aid of technologies such as unmanned aerial vehicles 
(UAV), or unmanned ground vehicle (UGV). Also, a point cloud which involves tiny points which could be 
plotted for relevance in a 3D space or surface that can be sourced from 3D laser scanners like light detection and 
ranging (LIDAR) (Paneru & Jeelani, 2021a). 


The construction site is known to be very dynamic in nature, consisting of various activities occurring most times 
intermittently (Ibrahim et al., 2009). Thus, the need to have a comprehensive overview that can be augmented by 
using digital images and videos in monitoring construction progress requiring little expertise because of the 
simplicity in its application. Table 1. Shows a summary of the data acquisition subprocess indicating the method 
of acquisition, devices utilized in this method, the benefits and the limitations to its use. 


Table 1: Computer Vision Data Acquisition 


aon Devices Benefits Limits Ref. 
method 
- Requires expertise and 

- Provides a detailed coverage of certification to operate 
per UAV’s a Pee eer - It could be expensive (D. Kim et al., 
wien, “Meet “edie ommualiyarieaes eaten A ay Moa 
y with sensors j : ay aa specification . et al., 2017) 

- Can be integrated with sensors like - Precision when use is 

cameras and laser scanners required 

- Provides stability for more clarity ` Restricted to one angle of 

Fixed Sytems Surveillance in data input obtained VEW (Benyeogor et 
Saye cameras - Adequate for sustained period of - Not efficient for al., 2020) 


data acquisition comprehensive coverage 
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- Portable and handy making it - Requires multip le takes to 
comfortable for use cover the entirety of (Jeon et al., 


Mobile - Provides flexibility in use, no construction space 2006; Mahami, 


Handeld cameras, DT . S 3 i i i . 
systems tablets restriction in getting elevations le obtained is subject to Nasirzadeh, 
i smartphones 22 angles ees And Ahmadabadian, 
p - Best for use in getting up-close accessibility. et al., 2019) 


data for clarity 


3.3 Information Retrieval & Analysis 


The data retrieved in form of images and videos needs to be analyzed in order to obtain useful information, which 
will be vital for progress estimation (next sub-process). Some of the commonly used methods includes traditional 
machine learning technique which includes support vector machines, Hough transform, artificial neural network 
and deep learning techniques using deep convolution neural networks (CNN). Zhu et al., (2010) proposed a novel 
technique using Hough Transform Technique to analyze 373 images of large-scale concrete columns for inspection 
of surfaces to detect defects, the technique when evaluated performed well with a high precision of 89.7% and 
recall of 84.3%. Kim etal., (2013) presented a method for measuring construction progress based on information 
included in 4D building information modelling (BIM) and 3D data, the data was classified using support vector 
machines (SVM) classifier, the SVM model was trained using a labelled 3D data. Because the initial as-built 
statuses of some components may be inaccurate as a result of an incomplete 3D data set, a two-stage revision was 
performed. The first stage was based on the sequence of activity execution and the second stage was based on the 
connectivity between components. Both the sequence of activity execution and the connectivity between 
components were stored in the BIM. The final as-built statuses produced by this process may be used to determine 
actual finish dates and to measure actual construction progress. The accuracy of the proposed method was validated 
using an incomplete set of 3D data acquired on an actual construction site achieving 99% precision rate on the 
second revision. Additionally, Wang et al., (2021) proposed a vision-based framework for monitoring precast walls 
during construction, using convolution neural networks (CNN) based computer vision method including Mask R- 
CNN and DeepSORT to realize object detection, instance segmentation and multiple objects tracking on the dataset 
obtained from surveillance cameras. The output from the study confirmed the detection rates of CNNs are fast 
compared to other techniques, this agrees with studies from (Paneru & Jeelani, 2021b; Sultana et al., 2018). Other 
relevant analytical methods include, Simultaneous Localization and Mapping (SLAM) (Kim et al., 2018), 
Structure From Motion (SFM) (Mahami et al., 2019),) Histogram Oriented Gradients (HOG) (Memarzadeh et al., 
2012 ,and Laplacian of Gaussian (LoG) (Hui & Brilakis, 2013). Table 2. Shows a summary of the information 
retrieval and analysis subprocess indicating the analytical method, models utilized in the method, the benefits and 
the limitations to its use. 


Table 2: Information Retrieval & Analysis. 


Analytical 
PAVER Model Benefits Limitations Ref 
method 
HOG, LoG, PEREN - Requires large dataset 
Traditional SURF: -Hough - Reliable and his i accuracy when po E RE (Aia a E 
Machine Learning transform, : ; 8 TIT and feature extraction of al., 2010) 
trained with balanced dataset. data 
SVM 
- Models can learn patterns from 
Mask R-CNN, data in high speed - Requires high end (Z. Wan 
; DeepSORT - Models trained usually have hardware for processing. , 8 
Deep learning : ‘ et al., 
accuracy - Requires large dataset fir 2021) 
- Requires no manual feature better analysis, 
extraction or engineering 
- Usually cost effective in A T: 
comparison with laser scanners - Long processing time (Kim et 


Photogrammetry SFM, SLAM - Requires hardware with al., 2018) 


- Flexibility of use on construction 
sites 
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3.4 Progress Estimation & Visualization 


This subprocess makes use of the information retrieved by analyzing the information retrieved. This process is 
usually a comparison between the as-planned model and the as-built model. The comparison is also known as 
registration as identified from various literature (Kopsida & Vela, 2015; Rehman et al., 2022). The output is 
significant for project controls as it gives an update on the project schedule; if the project is on schedule or behind 
schedule by showing the extent of construction which has been put in place on site (Reja et al., 2022). The result 
of this comparison is necessary in identifying the successive steps which the stakeholders can take in order to meet 
the project’s objective. The concept of building information modelling (BIM) is very prominent in this subprocess, 
as the format of the as-planned model can be presented in the four-dimensions 4D BIM model for comparison. 
After the comparison process, a matching process to see the disparity between the observed and the planned is 
conducted. The use of voxels, object matching, and probabilistic model have been used to detect progress. This 
progress is visualized using technologies which enable immersion such as Augmented reality and virtual reality 
(Ahmed, 2019). Several studies have identified the use of AR and VR as better visualization tools for progress 
monitoring (Omar & Nehdi, 2016; Rohani et al., 2014). In a case study, Meža et al., (2015) conducted a survey 
comparing AR with traditional visualization techniques like Gantt charts, AR ranked highest in “understandability 
of project documentation in monitoring of construction” and “usability of project documentation in monitoring of 
construction”. Wang et al., (2013) proposed a framework for integrating BIM with AR; the platform which is able 
to couple BIM and AR so that information about ‘as-built and as-planned progress’ as well as ‘current and future 
progress’ can be obtained and presented visually. Comprehensively, consolidated studies have identified AR and 
VR technologies to be ideal visualization techniques in progress monitoring, as they have shown to facilitate 
understanding of construction progress estimation. 


3.5 Cost and Time-factor of Traditional and Computer Vision-based Progress 
Monitoring 


In project management, cost and time are very significant factors that are used in defining the success of the project, 
and mangers are constantly seeking ways to optimize cost, be on schedule and to meet standard requirements of a 
construction project (Chan et al., 2004; Luong et al., 2021). They are also very important criteria which managers 
and stakeholders consider during decision-making in construction. Hence, it is imperative to include these 
parameters for the comparison of CV-based CPM and traditional progress monitoring. The cost of setting up a CV 
based CPM is relative depending on the devices used in each subprocess. In comparison to the traditional method 
which requires no automation or negligible technology, is perceived as an expensive process. Expenses including 
purchase of equipment, software, maintenance cost, technical support personnel and the training of users (Omar 
& Nehdi, 2016). Additionally, numerous researches have shown that CV based CPM is a time-saving and efficient 
process (Golparvar-Fard et al., 2009; C. Kim et al., 2013). In a case study conducted by Braun et al., (2020), 
using deep learning technique with sfm-based data consisting of categorized images of formwork, scaffolding and 
columns, a real time comparison between the as-planned and as-built, to detect progress of site activities was 
achieved in real time. When evaluated, the method produced a high precision of 90% in detection rate, enormously 
saving time in the process. 


3.6 Comparison between Traditional progress monitoring and CV based CPM 


In this section, the CV-based CPM was compared with the traditional progress monitoring using relevant indices 
which can assist stakeholders when making decision on both methods. Table 4 shows summary was obtained 
from the systematic review of literature on the application of both methods. 


Table 4. Summary of the comparison between CV-based CPM and Traditional progress monitoring 


Evaluation Criteria CV-based CPM Traditional Progress Monitoring 
Data Acquisition Reliable and timely Depends on the sense of judgement of the 
personnel executing the task 
: 3 Requires more of human input, personnel 
f : Requires expertise from the personnel . ; 
Information retrieval & . i ; involved in the process needs to be properly 
. performing the analysis, mistakes can be : ‘ i ; 
Analysis : trained to avoid errors, which will lead to 
spotted and quickly 


impact on the project. 


Progress monitoring & 
visualization 


The use of BIM/virtual reality/augmented 
reality gives a sense of realism and immersive 
and provides sturdy detail of the project. 


Doesn’t provide the realism and immersion 
that CV based CPM provides. Output might be 
difficult to interpret. 


Cost 


Process can be expensive, especially cost of 
hardware, software and training 


Cost relatively cheaper when compared to CV 
based CPM 
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Generally, saves time, notable for efficiency in 


Time ; 
site management. 


Time consuming process 


3.7 Limitations to the application of CV-based CPM 


Despite the benefits which CV based CPM process offers to stakeholders to enable effective decision making, 
there still exist some barriers which leads to the hesitation to its adoption. Some of these limitation include; lack 
of technical expertise on software and hardware, adverse weather conditions, occlusions, specifications of data 
acquisition device. These limitations were grouped into three which include, environmental factors, technical 
factors and human factors. 


Environmental factors are the barriers within the site location which prevent successful application of CV-based 
CPM. This includes, the impact of weather, for automating the data acquisition subprocess. The impact of the 
weather isn’t negligible as adverse weather condition distorts the quality of the image which inadvertently leads 
to poor analysis of the input (Omar & Nehdi, 2016). Poor lightning condition is also significant in the processing 
of input data. Hamledari et al., (2017) proposed a framework that automatically detects components of an interior 
partitions using 2D images, their study inferred on the significance and impact of good lightning in order to achieve 
good results on the detection of site objects. Other environmental factors include, air quality, site condition, due to 
varying activities, certain sites may be too clustered leading to data acquisition device hindered view of the entirety 
of work site space. 


Technical factors include the factors related to the technology implemented in each of the subprocess. Some are, 
the specification of data acquisition devices, image and video capturing devices, knowledge on the appropriate 
devices for the type of data required. Also, for the information retrieval, knowing the proper analysis on the data 
can be challenging, certain techniques require a lot of data to be trained in order to give a desired output (Moragane 
et al., 2022). Aerial systems like the UAVs require certifications, and a level of technical knowledge to operate, 
this can be challenging especially if it’s a small-scale project involved. 


Human factors are largely critical to the successful implementation of CV-based CPM. Barriers such as privacy 
issues, and reduced creativity from workers due to the displeasure caused by the feeling of being monitored 
(Ibrahim et al., 2009; Moragane et al., 2022). Despite this method being an automated process, certain subprocesses 
require the input of personnel in order to operate. For example, at the information retrieval subprocess, the 
technical know-how of the personnel executing the task is significant and as such requires requisite training 
(Paneru & Jeelani, 202 1a). 


3.8 Intellectual Merit / Broader Impact 


The intellectual merit of this work is how it reviews CV based CPM, highlighting on the process involved, 
developing an evaluation criterion to compare CV-based CPM with the traditional monitoring process, also 
identifying limitation to its application in project management. The broad impact of this work is will yield an 
increase understanding of CV based CPM by as an alternative to the traditional monitoring process, as its current 
level of adoption is still at its nascent stage. A simple holistic understanding of the process by stakeholders can 
assist in a more informed decision towards project monitoring in project management. 


3.9 Conclusion 


In project management, construction progress monitoring is a very significant process in achieving a successful 
project delivery. However, most projects still undergo the traditional manual progress monitoring process. The 
process has been identified to be time consuming, error prone and subject to the bias and technical know-how of 
the personnel involved in the process, and this concern leads to the need to automate the process. The objective of 
the paper was to assess the impact of CV based CPM as an effective decision making tool by highlighting key 
subprocesses involved in the process, including listing an evaluation criteria for comparing the CV based CPM 
with the traditional manual monitoring process, and to identify barriers to its adoption as a project monitoring tool 
in project management. A systematic review was conducted, to evaluate literature relating to the topic. Databases 
from Scopus, WOS, ASCE and Google scholar were sources of data for the review, PRISMA analysis was used in 
screening all the papers in order to ascertain relevant literature for this study. 


The outcome of this study gave a concise description of subprocesses associated with CV based CPM process. 
Images and videos are currently the most utilized data in this process and this is most common because of its ease 
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of accessibility and availability of processing systems. Aerial systems which include the use of drones and 
augmented reality googles show great potential as an effective data acquisition device. Deep learning technique 
using CNN due to its speed in detection can be integrated with aerial systems for an effective monitoring system 
with real time output. Lastly, the study highlighted limitations for the applications of CV based CPM and 
categorized these as human environmental and technical factors, and the review identified technical factors to be 
a significant factor among others. In short, this study is important because it provides a simple holistic 
understanding of the process thus aiding stakeholders with accurate knowledge for decision-making towards CV 
based CPM in construction project management. 
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ABSTRACT: The construction industry stands to greatly benefit from the technological advancements in deep 
learning and computer vision, which can automate time-consuming tasks such as quality control. In this paper, we 
introduce a framework that incorporates two advanced tools - the Visual Quality Control (VOC) tool and the 
Digital Twin visualization with Augmented Reality (DigiTAR) tool - to perform semi-automated visual quality 
control in the construction site during the execution phase of the project. The VOC tool is a backend service that 
detects potential defects on images captured on-site using the Mask R-CNN algorithm trained on annotated images 
of concrete and railway defects. The surveyor, aided by the Augmented Reality (AR) technology through the 
DigiTAR tool, can in-situ confirm/reject the detected defects and propose remedial actions. All the quality control 
results are recorded in the relevant BIM model and can be viewed on-site overlaid on the physical construction 
elements. This solution offers a semi-automated visual inspection that can speed up and simplify the quality control 
process, especially in case of large linear infrastructures, illustrating the added value of AR-based applications in 
Digital Twins. 


KEYWORDS: BIM, Augmented Reality, AR in Construction, Deep Learning, Computer Vision, Visual Inspection, 
Digital Twins 


1. INTRODUCTION 


A prominent challenge in the construction industry is the ability to swiftly and seamlessly adapt to changes. To 
address this issue, an effective approach involves harnessing the power of computer-aided tools that can replace 
time-consuming activities. By integrating such tools into construction processes, valuable time and effort are saved, 
leading to significant cost reductions. By introducing digitalized processes to handle repetitive and labor-intensive 
tasks, construction projects can enhance their adaptability and responsiveness to changes. This allows teams to 
allocate their resources more efficiently, enabling them to focus on more critical aspects of the project. Furthermore, 
the digitalization of manual processes and the use of machine learning algorithms facilitate faster decision-making 
and reduce the likelihood of errors, as they can process vast amounts of data accurately and consistently. The 
increased accuracy and efficiency provided by these tools contribute to improved project outcomes and overall 
productivity. 


This study introduces a semi-automated approach for visual quality control during the execution phase of 
construction projects. The proposed framework leverages recent technological advancements in deep learning and 
computer vision. It is designed to incorporate two essential components: the Visual Quality Control (VQC) tool 
and the Digital Twin visualization with Augmented Reality (DigiTAR) tool. The VQC tool serves as a backend 
service; it incorporates a deep learning network trained to detect concrete and railway defects in construction site 
images. The DigiTAR tool harnesses the power of AR technology to provide a unique visualization experience of 
the BIM model. Through DigiTAR, users can immerse themselves in the construction site and witness the 3D 
Building Information Modeling (BIM) model in real-time, where the digital BIM model components are overlaid 
onto the physical components. DigiTAR is responsible for visualizing the VQC results on-site. This means that 
key stakeholders, such as the project manager and quality manager of the construction project, can conveniently 
review and confirm these results firsthand. Their confirmation of these VQC results is pivotal, as it determines 
whether additional remedial works are assigned to the components identified in the VQC data. By having access 
to such crucial data on-site, decision-making processes can be expedited, and effective collaboration among 
stakeholders is further enhanced. 


The novelty of the proposed approach in comparison to other existing solutions is that it simultaneously allows: 
(1) a collaborative inspection of construction sites (different inspectors, both in-situ and asynchronously); (2) 
different types of annotations (texts, strokes, images, 3D models); (3) geolocated annotations (related to specific 
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elements of the virtual BIM model); (4) the monitoring and editing of registered annotations; and (5) the in-situ 
visualization of both the designed and the actual state of the building by means of the AR technology. By 
incorporating these novel features, the proposed approach significantly improves the inspection process, fosters 
collaboration among stakeholders, and ensures higher-quality construction outcomes. This comprehensive and 
innovative approach addresses critical challenges faced in the construction industry, promoting efficiency and 
excellence throughout the project lifecycle. 


The rest of the paper is structured as follows. In Section 2, related work is surveyed, focusing on: i) visual 
inspection methods of railways using deep learning techniques and ii) AR approaches for on-site construction 
inspection. In Section 3 we present the bundle of the quality control tools in detail, addressing its design and 
technological and implementation issues. Section 4 provides the results of the evaluation process and a case study 
demonstration example of utilizing the proposed framework in a real environment. Finally, the conclusion section 
summarizes the main findings. 


2. RELATED WORK 


In this section, the relevant literature review is presented. Firstly, we focus on the automated inspection of 
construction sites using mainly deep learning techniques. Secondly, research approaches concerning the 
construction sites inspection with the use of AR technology are briefly presented. 


2.1 Visual inspection using deep learning techniques 


In recent years, deep-learning algorithms have shown remarkable performance in image object recognition and 
Convolutional Neural Networks (CNNs) have attracted wide attention as an effective recognition method. CNNs 
have been applied successfully to detect structural damages. Many studies have been conducted focusing on binary 
classification issues, such as crack detection (Brien et al., 2023), including additional estimations regarding the 
depth of the crack (Laxman et al., 2023) or the width of the crack (Meng et al., 2023). In addition, multiple surveys 
have been focused on crack detection and segmentation (Attard et al., 2019; X. Xu et al., 2022), corrosion detection 
(Atha & Jahanshahi, 2018; Papamarkou et al., 2021), bughole detection (F. Wei et al., 2019), and multi-damage 
detection (Cha et al., 2018; Kumar et al., 2021). 


Focusing on railways inspection, most of the studies examine the defects on the railway track lines due to the long- 
term pressure from train operations and direct exposure to the natural environment, which have a direct impact on 
the safety of train operations (Cao et al., 2020; Guo et al., 2021; Liang et al., 2019; Zhang, Liang, et al., 2021). In 
(Gan et al., 2017), an automatic inspection system for rail surface discrete defects due to fatigue was created and 
tested, extending the literature review with the Rail Surface Discrete Defects (RSDD) dataset. The Rail-5k dataset 
(Zhang, Yu, et al., 2021) includes the thirteen most common types of rail defects and is considered a benchmark 
dataset for rail surface and fastener defects. In (Zheng et al., 2021), a multi object detection method based on deep 
CNN is proposed, achieving a non-destructive detection of rail surface and fastener defects. In this method, rails 
and fasteners on the railway track images are firstly localized by YOLOvS. Then, surface defects of the rail are 
detected and segmented based on Mask R-CNN (He et al., 2017), while a ResNet framework is used to classify 
the state of the fasteners. In (X. Wei et al., 2019), the authors compare different methods for fastener defect 
detection and recognition, concluding that with the Faster R-CNN the fastener positioning and recognition can be 
carried out simultaneously. (Y. Xu et al., 2021) proposed a novel method for tunnel defect inspection (such as 
leakage and spalling) based on the Mask R-CNN. The network was modified appropriately (extra feature pyramid 
network and edge detection branch) to achieve a higher accuracy in tunnel defect detection and segmentation. (Xue 
& Li, 2018) proposed a fully convolutional network (FCN) model for automatic classification and detection of 
tunnel lining defects (such as leakage, crack, and segment joint). The authors compare their proposed method with 
traditional convolutional networks (such as VGG) and Faster R-CNN, concluding that the proposed model is very 
fast and efficient. In (Xue et al., 2020), a deep learning-based model for automatic calculation of the water leakage 
areas of a shield tunnel surface is proposed. Optimization measurements, such as data augmentation, transfer 
learning, and cascade strategy, were adopted to improve the performance of the original model. 


In conclusion, many of the existing studies focus on concrete surfaces and tackle the issue of binary classification 
(e.g., crack/non-crack). To the best of our knowledge, studies that concern multiclass classification and detection 
focus mostly on long-term concrete defects. In addition, they mainly refer to bridge or rail track deterioration and 
defect detection. 
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2.2 AR for construction inspection 


In (Garcia-Pereira et al., 2020) an AR-based tool is developed for the inspection of prefabricated buildings. The 
tool has been evaluated positively, as it allows collaborative inspection, supports multi-type, geolocated 
annotations, and in-situ augmented reality visualizations. (Chi et al., 2022) present a method, which combines AR 
and laser-scanning technologies to provide intuitive and accurate rebar inspection. The as-built (point clouds) and 
as-planned data are compared to provide discrepancy information for the inspectors. With the AR, the user is able 
to visualize the rebar inspection outputs and provide rework instructions. (Zhou et al., 2017) propose an AR-based 
method to rapidly inspect segment displacement during tunneling construction. The quality inspector is able to 
overlay the baseline model, which is established according to the quality standard, onto the real structure and 
measure the differences between them. In (Kwon et al., 2014), a defect management system for reinforced concrete 
work is presented, utilizing BIM, image-matching, and AR. The authors developed two separate applications: an 
image-matching system for quality inspection without visiting the construction site (by comparing the 2D images 
from the BIM model with the real on-site images) and a mobile AR application for workers and managers to detect 
dimension errors/omissions on-site, in order to save time and reduce rework costs. 


The proposed framework combines the automated image-based visual inspection, powered by advanced deep 
learning techniques, with AR on-site visualization and confirmation of the QC outputs. The scope is to provide an 
efficient solution that not only saves time, but also prevents chained construction error and reduces the need for 
costly reworks during the construction phase. 


3. MATERIALS AND METHODS 


The work presented in this paper is developed as part of the COnstruction phase diGItal Twin mOdel (COGITO) 
project (COGITO Project, n.d.). The COGITO project offers, among others, a bundle of tools for conducting a 
semi-automated visual quality control during the construction phase of large linear infrastructures (especially 
railways) aiming at minimizing the effort and the time usually needed for on-site visual inspection. 


Within COGITO, an image-based inspection system is developed, complemented with AR visualization and 
interaction. Firstly, as-built data (2D images) are acquired on-site using various capturing devices, such as 
smartphones, cameras, and AR devices. Secondly, the acquired images are processed (e.g., cropping, resizing) by 
a dedicated Visual Data Pre-processing tool. At this point, each processed image is linked to a specific QC task 
and to the respective BIM elements depicted in the image. In the third step, the data are forwarded for the automated 
visual quality control. Since each image is linked to a specific element of the BIM model, the quality control results 
and the detected defects are also linked to elements of the BIM model (fourth step). Therefore, the inspector is 
able to visualize and confirm the QC results on-site using AR with each QC result pinned on the corresponding 
BIM element (fifth step). The inspector can either confirm or reject each detected defect and propose a rework or 
a mitigation work, if needed (sixth step). Finally, workers perform the proposed remedial works (seventh step). 
Since the defects, as well as the proposed reworks, are recorded to the BIM model, the defect management is 
facilitated, resulting in cost and time savings during the construction phase. The overview of the COGITO Visual 
QC framework is presented in Figure 1. 
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Figure 1: COGITO QC workflow 
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3.1 On-site Data Acquisition 


Since it is necessary to capture images or videos of specific new as-built elements, various means of data 
acquisition can be utilized, such as cameras, mobile phones, drones and/or AR glasses (Microsoft HoloLens2, n.d.). 
Regardless of the means used for this purpose, some generic guidelines should be followed during this procedure, 
in order to achieve successful automated quality control and optimize the quality of the obtained results. More 
specifically, the images should be approximately 1000 x 1000 pixels, without spray markers or other signs that 
may affect negatively the QC results. They also need to be close shot and clear (not too generic or blurry) and the 
lighting conditions should be appropriate to ensure that the desired element is visible in the image. In case of video, 
the captured video will be automatically converted into a panorama image during the data processing phase. 
However, the video duration should be approximately five seconds (less than eight seconds) in order to generate 
an appropriate panorama image. In addition, a straightforward path should be followed while capturing. It is 
recommended to avoid rotating, shifting, sliding back or maneuvering. Finally, all the conditions for image 
capturing (close shot and clear, sufficient lighting conditions, without spray markers) should be also applied in 
case of video capturing. 


3.2 Visual Data Pre-processing 


After the on-site data capturing, the images need to be prepared and uploaded for the automated quality control. 
Within the COGITO project, this can be achieved both via a Pre-Processing Desktop application or the DigiTAR 
application in-situ, if the images are captured with a mobile phone or with HoloLens 2, respectively. The images 
should be linked to a specific QC task and BIM element before processing. The image processing includes filter 
application, such as modifying the contrast or the brightness of the photo and resizing or cropping it to focus on 
the region of interest. The aim of preprocessing is to prepare the image for the automated quality control. In case 
of uploading a video, a respective panorama image is generated automatically and the user is able to process it in 
a similar way to the normal images. Once all the desired data (images and videos) have been processed and related 
to a QC task, they are forwarded for the automated quality control. In Figure 2, the COGITO visual data pre- 
processing workflow is depicted. 
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Figure 2: COGITO Visual Data Pre-Processing workflow 
3.3 Automated Quality Control 


The preprocessed visual data are forwarded for the automated quality control. Since the scope of the COGITO 
solution is to perform an automated quality control during the construction phase of new large linear infrastructures 
(especially railways), the VQC tool has been specifically designed to serve this purpose and the chosen defect 
classes for the algorithms are tailored to address construction-related issues, rather than covering defects attributed 
to aging of materials. Based on the deep learning algorithms employed, the VQC tool is able to detect defects, 
which are likely to occur during the railway construction, on concrete and steel elements. More specifically, in 
case of concrete surfaces, the system is able to detect cracks or honeycomb defects, while in case of railway steel 
elements, it detects missing clamps, missing screws, and missing screw nuts. The defect detection includes both 
the object detection and semantic segmentation. The goal of object detection is to classify individual defects and 
localize them using a bounding box and the goal of semantic segmentation is to distinguish the defects at the pixel 
level. 


3.3.1 Dataset Preparation 


For the concrete case, a dataset with concrete cracks and honeycomb images was built. The images have been 
combined from (Crack Segmentation Dataset, n.d.) and (Concrete Crack Segmentation Dataset, n.d.). Furthermore, 
additional data (with high resolution and image size) captured by Unmanned Aerial Vehicles (UAVs) were used. 
The original large UAVs images were divided into several smaller images using a Python script. For the steel case, 
a dataset was built using images collected from an above ground railway construction site in Munich. The images 
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depict three types of defects that can occur during the railway placement: missing clamp, missing screw, and 
missing screw nut. The data were resized to be consistent and have fixed dimensions (1.e., 1024 x 1024 pixels). 
Since the scope of the project is the automated quality control in railways, underground areas are likely to exist. 
Therefore, an offline data augmentation was performed to the images, in order to reduce the brightness and 
simulate the tunnel lighting conditions. For the concrete dataset, 1970 images were used in total to train the model 
for detecting the two aforementioned types of concrete defects. The ratio of the training and validation sets was 
almost 4:1; the training and validation sets comprise 1544 and 426 images, respectively. For the steel dataset, 2195 
images were used in total to train the model for detecting the railway joint defects. The ratio of the training and 
validation sets was almost 5:1; the training and validation sets comprise 1720 and 493 images, respectively. The 
annotation of the dataset is an important and fundamental step. The image label tool LabelMe (Russell et al., 2008) 
was used to label the masks of the objects in both cases (concrete and steel elements). 


3.3.2 Transfer Learning Implementation 


The VQC tool is designed to detect defects on concrete surfaces and in steel railway elements. For this purpose, 
two different models (for concrete and steel case, respectively) have been trained using the Matterport’s 
implementation of Mask R-CNN for TensorFlow2.0 (Abdulla, 2017). Mask R-CNN is an extension to the original 
Faster R-CNN, by adding a branch for predicting segmentation masks on each Region of Interest (RoI) using an 
FCN, in parallel with the existing branch for classification and bounding box regression (He et al., 2017). Therefore, 
Mask R-CNN not only outputs a class label and a bounding box, but also a binary mask for each detected object. 
The network was trained with a learning rate of 0.001, momentum of 0.90, and weight decay of 0.0001. ResNet50 
was used as a backbone architecture. The IMG_SIZE and the TRAIN ROIS PER IMAGE parameters were set 
to 512 and to 80, respectively. The RPN ANCHOR SCALES parameter was set to (16, 32, 64, 128, 256). The 
value of MAX _GT_INSTANCES and DETECTION MAX INSTANCES parameters were set in both cases to 5. 
Since a transfer learning technique was applied, the COCO dataset was used to pre-train the network and initialize 
its weights. Finally, only the head layers were re-trained and fine-tuned on the appropriate datasets. 


The configuration of system environment was Python 3.8, Keras 2.4.3, TensorFlow 2.4.1, CUDA 11.0, and 
CUDNN 8.0.5 on a computer with a NVIDIA GeForce RTX 3080 GPU and a Core i7-10700 @2.9GHz CPU, with 
32 GB RAM memory. 


3.4 AR Visualization 


The QC results obtained by the automatic visual quality control are visualized on-site with the DigiTAR tool, in 
order to be confirmed by the relevant stakeholders, such as the project manager and the quality manager of the 
construction project. Based on their decision, additional remedial works can be assigned to the components 
included in the VQC results. In addition, the DigiTAR tool enables the AR visualization of the BIM models. The 
user is able to view the 3D BIM model on-site, i.e., view the 3D BIM elements overlaying the physical elements. 
The workflow of the QC results confirmation process within DigiTAR is depicted in Figure 3. DigiTAR is 
developed using the Unity 3D Game Engine and is specifically optimized to operate on (Microsoft HoloLens2, 
n.d.) devices. In Section 3.4.1, the BIM model visualization functionality of the DigiTAR tool is described in detail, 
while the registration process of the BIM model is described in Section 3.4.2. Details for the visualization of the 
relevant QC results and the data acquisition functionality of DigiTAR are enclosed in Sections 3.4.3 and 3.4.4, 
respectively. 


lig 


` 3D BIM model visualization __ 


A = N 


\_ 3D Quality Control Visualization _ e 


Checked QC Results 
and Remedial works 


“a 
Ps) VISUAL QUALITY CONTROL RESULTS CONFIRMATION WITH DIGITAR 


Figure 3: DigiTAR BIM model visualization and QC results confirmation workflow 
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3.4.1 BIM model visualization 


BIM model visualization is a key functionality of the DigiTAR tool. To enable this functionality, DigiTAR requires 
as input the BIM model of the construction site in an Industry Foundation Classes (IFC) format. Additionally, the 
tool needs the 3D geometry representation of the BIM model. The geometry representation of the BIM model is 
achieved through the transformation of the IFC file to a file format supported by the Unity Game Engine, such as 
the OBJ file format. 


The IFC parsing process in DigiTAR involves importing the IFC and OBJ files, extracting the IFC data, and 
mapping those data to the 3D model. This process is handled by custom C# classes based on the Xbim library 
(Lockley et al., 2017). The IFC parsing process is implemented by recursively querying and retrieving data from 
the IFC file for the elements of the IFC file using the IFC Schema. A GameObject is generated for each IFC 
element and parent/child relationships are established based on the hierarchical relationships of the elements in the 
IFC file. Upon completion of the IFC parsing, the result is a hierarchical structure, where each GameObject has 
its own IFC properties extracted in a dedicated C# class. 


3.4.2 BIM model registration 


After visualizing the BIM model, the next crucial step in the DigiTAR tool is registration, which involves aligning 
the 3D model of the construction site to the actual site. Within DigiTAR, registration relies on image targets using 
the Vuforia SDK (Vuforia Engine, n.d.). An image target is an image that the application running on HoloLens will 
detect and track. This image will be the link between the static 3D world (BIM model) and the real world. 


The image target is printed and positioned at a location in the real world, ensuring that it is accessible to the person 
wearing the HoloLens. At the same time, an identical image is placed in exactly the same spot in the 3D BIM 
model. To enable the detection of the image target, the user uses speech command “Scan for marker”. This way, 
the data captured by the HoloLens sensors and cameras are utilized by DigiTAR for image target detection. More 
specifically, features are extracted from the HoloLens camera stream and are compared to the reference features 
already extracted from the image target. In the context of pattern recognition, the features that are extracted in 
advance from the image target constitute the pattern that the algorithm searches across the continuous flow of data 
streams. When the person wearing the HoloLens looks at the image target, the features extracted from the data 
stream of HoloLens are matched to the pattern of features belonging to the image target. Therefore, the image 
target is detected and registration is performed. 


After successful registration of the 3D BIM model and in order to maintain it, the registered 3D BIM model is 
continuously tracked. In the DigiTAR application, the registration of the 3D BIM model is tracked using spatial 
anchors; spatial anchors represent important points in the world that the HoloLens coordinate system keeps track 
of over time. The registered 3D BIM model can be set as a spatial anchor using the dedicated in DigiTAR speech 
command “Anchor model”. This way, the next time the user opens the DigiTAR application, the 3D BIM model 
is loaded aligned to the real world without the need to repeat the registration process. 


3.4.3 QC results visualization 


The QC results are visualized using 3D QC tags that are pinned on the elements of the BIM model that are included 
in the QC result. The QC tags are displayed in Figure 4. To visually notify the user, the color of the tag is indicative 
of the relevant QC results: green if no defect has been detected, red if all QC results have detected defects, and 
orange if the QC results include both detected defects and no defects. 


x 4 x 


VQC result VQC result VQC result 


Figure 4: Visual Quality Control tags are pinned on the involved elements 


Firstly, the QC tag is placed on the center of the element’s bound. When an element with a QC gets in the user’s 
field of view, the QC tag dynamically changes its position and rotation while staying on the surface of the 3D 
element. More specifically, the position of the QC tag is dynamically adjusted to the user’s height, while the 
rotation of the QC tag is dynamically adjusted so that the QC tag is displayed vertically in front of the user. A view 
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of a 3D BIM model with pinned QC tags is depicted in Figure 5. Moreover, if the user selects (using the Hand ray 
gesture on HoloLens!) a 3D element that has a pinned QC tag, the QC tag follows the movement of the user’s 
hand, while staying on the surface of the 3D element. An illustration of this feature is depicted in Figure 6. If the 
user performs the air tap gesture” on a QC tag, the related VQC results are displayed using dedicated AR menus, 
as described subsequently in Section 4.2. 


Figure 5: View of the 3D BIM model with QC tags Figure 6: When selected, the QC tag follows the 
movement of the user’s hand 


3.4.4 Data acquisition and pre-processing 


DigiTAR acts as a data acquisition tool for gathering images on-site to be used for automated quality control. This 
functionality is implemented in DigiTAR using a hand-attached menu’. When the user selects the dedicated 
“Capture Image” button, an asynchronous process is initiated to assess the HoloLens camera stream for photo 
capturing. 


When the user looks at what they want to capture and say “Capture image”, a photo is captured. The photo is saved 
in a folder of the HoloLens device. This folder, exclusively created by DigiTAR, stores only the images captured 
within the tool. This segregation is essential since these photos are accompanied by important metadata, including 
the capture time and the user's position and orientation at the time of capture. The alignment of the 3D BIM model 
with the real world, achieved through the registration process and spatial anchoring, enables precise association of 
the captured images with their corresponding locations in the BIM model. 


After capturing the images, users can perform pre-processing on them before uploading them to be utilized by the 
automated quality control system. For this purpose, DigiTAR establishes direct communication with the backend 
of the Visual Data Pre-processing module. This seamless integration streamlines the process of preparing the 
captured images for subsequent quality control analysis, ultimately enhancing the efficiency and accuracy of the 
entire construction quality management process. 


4. RESULTS AND DISCUSSION 


The first subsection presents and analyses the evaluation process of the trained Mask R-CNN for defect detection. 
In the second subsection, a use case of the overall quality control process is presented, endowed with the in-situ 
results’ visualization and confirmation via the DigiTAR application. 


4.1 Automated Quality Control Evaluation 


The performance of the two models (concrete and steel case) was evaluated using the mean Average Precision 
(mAP), since this metric is often used to evaluate object detection models. Precision is the percentage of correct 
positive predictions for overall predictions. Specifically, mAP is the mean value of average precision (AP) for each 
object class (Guo et al., 2021). The concrete model and the railway model were evaluated for 20 and 10 epochs 
respectively. The mAP for the concrete model reached the value of 0.87, while the mAP for the railway defects 
was calculated 0.95. Figure 7 shows the ground truth and the respective predictions of the proposed models for 
some typical examples. For each example, the generated images contain the label prediction, the confidence level, 
and the respective mask. The label prediction indicates the identified defect type detected by the model. The 
confidence level represents the model's level of certainty or confidence in its prediction. The mask displayed in 


* https://learn.microsoft.com/en-us/windows/mixed-reality/design/point-and-commit#hand-rays 
* https://learn.microsoft.com/en-us/dynamics365/mixed-reality/guides/operator-gestures-hl2#air-tap 
3 https://learn.microsoft.com/en-us/windows/mixed-reality/design/hand-menu 
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the images highlights the specific region or area where the defect has been identified. This visual representation 
allows for a clear understanding of the location and extent of the detected defect within the image. 


b) predictions 


Figure 7: Original images (a) and predictions (b) for crack, honeycomb, missing clamp, missing screw, and 
missing screw nut. 


4.2 Use Case Demonstration 


The case study is focused on a railway line across Munich, Germany. The old line had to be replaced with a new 
one. During the reconstruction phase, the site was checked for cracks, honeycombs and rail defects, such as missing 
clamps, missing screws, and screw nuts. 


Regarding the AR visualization, the IFC and the OBJ files for the railway site were parsed within DigiTAR using 
the BIM model visualization process, which is described in Section 3.4.1. The registration process, that is described 
in Section 3.4.2, was conducted using a strategically positioned image target within the construction zone. Precise 
measurements in meters, obtained from the IFC file, guided the accurate placement of the image target on-site. 
After the registration process was completed, the 3D BIM model became aligned with the actual construction site. 
This alignment allowed for accurate integration of the digital model with the real-world environment. An on-site 
3D BIM model visualization using DigiTAR is illustrated in Figure 8. Figure 9 illustrates the successful 
visualization of the QC outcomes on HoloLens 2 using DigiTAR. 


= 


Figure 8: Screenshot of the 3D BIM model, as visualized on-site with DigiTAR 
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Figure 9: Screenshot of QC tags visualized on-site with DigiTAR 


The surveyor captured images of the new elements using the DigiTAR tool following the procedure described in 
Section 3.4.4. Also, the surveyor processed the images on-site and uploaded them for automatic quality control. 
Utilizing the power of the VQC tool, the uploaded images underwent comprehensive assessment, generating 
valuable results. These results were then promoted to the DigiTAR tool for on-site inspection and confirmation. 
By performing the air tap gesture on a QC tag, an overview of the related Visual QC results was displayed to the 
surveyor, as depicted in Figure 10. 


Visual Quality Control Results Job 533- Defect 1 


IFC Element Predicted label: Crack 


Type: lfcWall 
Name: Basic Wall:Generic - 400mm:34798 
Globalid: 3yLfglqiXA_uP4_qbK5gFu 


Visual Quality Control Results 


Confidence level: 0.995 
Outcome: 
Add a remedial work 


Task ID: 12vqe 
Job ID: 533 


Description: 


Material: Concrete 

Result: 1 Defects, 
0 Confirmed, 
0 Rejected 


Priority: 


Time schedule: 


Proceed to check and confirm defects? Save remedial work? 


< Back Next > 


v Yes 


Figure 10: Overview visualization of the Visual QC Figure 11: Menu to add remedial work to a QC result 
results for a specific element 


By selecting the “Next” button in the menu in Figure 10, the surveyor could view details for the detected defect, 
as can be seen in Figure 12. The annotated image, the label of the detected defect and the confidence level were 
displayed (left figure in Figure 12). By selecting the “Original image” button, the surveyor could switch to viewing 
the original image that was sent for automatic visual quality control (right figure in Figure 12). 


Upon confirming a detected defect, the surveyor was presented with the option to add a remedial work for the 
identified issue. The user-friendly menu to add a remedial work, as depicted in Figure 11, facilitated this process 
within the DigiTAR tool. To input the necessary information for the remedial work, the surveyor simply selected 
the relevant input fields on the menu. Upon selection, the HoloLens system keyboard was activated, allowing the 
user to type using hand gestures, making the data input intuitive and efficient. 


The ability to process the remedial work in real-time within DigiTAR provided valuable advantages. It allowed 
for immediate consideration of mitigation measures and enabled rapid decision-making to address the identified 
defect effectively. This dynamic workflow streamlined the process of adding remedial works and contributed to 
enhanced project management and quality control. 
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Job 533- Defect 1 <Back g Job 533- Defect 1 


Predicted label: Crack Predicted label: Crack 
Confidence level: 0.995 Confidence level: 0.995 


= Original = Annotated 
4 image a image 


Figure 12: Visualization of a VQC result. The user can view the annotated (left) and the original image (right). 


5. CONCLUSIONS AND FUTURE WORK 


Embracing automation in the construction industry leads to improvements in the adaptability of the project and 
paves the way for greater innovation and advancement. As technology continues to evolve, leveraging automatic 
tools becomes a crucial aspect of staying competitive in the ever-changing construction landscape. 


This study presents a framework for semi-automated visual quality control inspection in construction sites during 
the execution phase of the project. The framework incorporates two tools; the Visual Quality Control (VQC) tool 
and the Digital Twin visualization with Augmented Reality (DigiTAR) tool. The first tool incorporates a deep 
learning network trained to detect concrete and railway defects and serves as a backend service for automatic 
visual quality control on images captured at construction sites. The second tool leverages AR technology to display 
the visual quality control results on-site. The surveyors can inspect the detected defects in-situ and confirm or 
reject them. They are also prompted to add remedial works, if needed. DigiTAR displays the 3D BIM model of 
the construction site, i.e., the model is visualized to overlay the actual site, allowing construction professionals to 
interact with the BIM model in a dynamic and realistic manner using AR technology. This critical functionality 
enhances the overall understanding and visualization of the construction site, promoting better decision-making 
and coordination throughout the project lifecycle. 


By combining automated quality control (performed by the VQC tool) with DigiTAR's intuitive interface and 
augmented reality capabilities, the surveyors gain real-time access to the quality control outcomes. This facilitates 
decision-making and enables prompt confirmation of the results, ensuring the construction project adheres to the 
highest quality standards. The seamless flow of data and information between the automatic quality control system 
and the DigiTAR tool enhances efficiency and accuracy, ultimately contributing to the successful execution of the 
construction project. The proposed framework aims to demonstrate how the synergy between cutting-edge 
technology and user-friendly interfaces can create a powerful asset for construction professionals in ensuring top- 
notch project outcomes. Future efforts will be dedicated to improving and expanding the model's training to 
encompass a wider range of defects. This endeavor aims to enhance the model's accuracy and efficiency in 
detecting various types of issues within the construction site. Additionally, the image acquisition procedure could 
be automatized and significantly improved utilizing drones and construction site inspection robots (such as Spot 
robots that are used for automated laser scanning), since our framework has been developed to support this 
functionality. Finally, there is a plan to equip the model with the capability to detect defects on video streams and 
empower DigiTAR to also display the video captures. This enhancement will enable real-time monitoring and 
analysis of ongoing construction activities, empowering construction professionals to address potential issues as 
they arise. 
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A COMPARATIVE STUDY OF DEEP LEARNING MODELS FOR 
SYMBOL DETECTION IN TECHNICAL DRAWINGS 


Benedikt Faltin, Damaris Gann & Markus König 
Ruhr University Bochum, Germany 


ABSTRACT: Symbols are a universal way to convey complex information in technical drawings since they can 
represent a wide range of elements, including components, materials, or relationships, in a concise and space- 
saving manner. Therefore, to enable a digital and automatic interpretation of pixel-based drawings, accurate 
detection of symbols is a crucial step. To enhance the efficiency of the digitization process, current research focuses 
on automating this symbol detection using deep learning models. However, the ever-increasing repertoire of model 
architectures poses a challenge for researchers and practitioners alike in retaining an overview of the latest 
advancements and selecting the most suitable model architecture for their respective use cases. To provide 
guidance, this contribution conducts a comparative study of prevalent and state-of-the-art model architectures for 
the task of symbol detection in pixel-based construction drawings. Therefore, this study evaluates six different 
object detection model architectures, including YOLOv5, YOLOv7, YOLOv8, Swin-Transformer, ConvNeXt, and 
Faster-RCNN. These models are trained and tested on two distinct datasets from the bridge and residential 
building domains, both representing substantial sub-sectors of the construction industry. Furthermore, the models 
are evaluated based on five criteria, i.e., detection accuracy, robustness to data scarcity, training time, inference 
time, and model size. In summary, our comparative study highlights the performance and capabilities of different 
deep learning models for symbol detection in construction drawings. Through the comprehensive evaluation and 
practical insights, this research facilitates the advancement of automated symbol detection by showing the 
strengths and weaknesses of the model architectures, thus providing users with valuable guidance in choosing the 
most appropriate model for their real-world applications. 


KEYWORDS: Computer Vision, Technical Drawings, Symbol Detection, Comparative Study 


1. INTRODUCTION 


Symbols pose an efficient and space-saving way of conveying complex information, enabling understanding across 
languages due to their standardized appearance. They find application in diverse contexts, such as street signs 
(Gudigar et al., 2016), maps (Huang et al., 2023), or technical drawings (Elyan et al., 2020b). For instance, in 
construction drawings symbols can represent architectural components, plumbing fixtures, elevation markings, 
and more, making accurate identification of the symbols essential for understanding the entire drawing. The 
importance of precise symbol detection becomes even more apparent when algorithms are used to understand the 
technical drawings automatically, e.g., for their digitization. Therefore, research has focused on developing 
effective and accurate algorithms for locating and classifying symbols in technical drawings. Ah-Soon (1998) 
proposed to adapt Messmer's algorithm for symbol recognition in architectural drawings using graph matching. 
The drawing is first vectorized, followed by the extraction and merging of geometric features for each symbol, 
and clustering similar symbols by type. In a separate work, Adam et al. (2000) developed an orientation and scale 
invariant method for recognizing symbols in technical documents. Their algorithm is based on the Fourier-Mellin 
transform, which extracts features used to label the symbols through a classifier. 


Most of this research focuses on traditional image analysis techniques, such as vectorization and feature 
engineering. However, with the advent of efficient deep learning approaches, such as convolutional neural 
networks (CNN), the research community's interest has shifted to use such models for localizing and classifying 
symbols in technical drawings. Ziran and Marinai (2018) leverages the object detection networks Faster R-CNN 
and Single Shot Detector (SSD) to detect symbols representing objects such as furniture, doors, and windows in 
architectural floor plans. In the context of piping and instrumentation diagrams (P&IDs), Mani et al. (2020) 
develops a custom CNN to classify components in the P&IDs. The proposed network is trained to classify patches 
cropped from the overall drawing into three classes: tag symbol, component symbol, or no symbol. When the 
network detects a symbol within a patch, the corresponding detection is projected back onto the original drawing. 
In a different approach for P&IDs, Elyan et al. (2020a) employs the YOLO architecture for symbol detection. 
Additionally, the authors propose a method based on generative adversarial networks to mitigate the issue of class 
imbalance in technical drawings. Class imbalance occurs when the number of instances per symbol class shows 
significant variation. A novel approach to enhance symbol detection by inferring the symbol orientation is 
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introduced by Faltin et al. (2023b). The authors leverage human pose detection networks, specifically Mask R- 
CNN and YOLOv7Pose, to achieve accurate symbol pose estimation in construction drawings. 


While the proposed approaches already utilize several different detection models, the overall field of object 
detection has experienced rapid growth, leading to the emergence of numerous model architectures (Zaidi et al., 
2022). Researchers and industry practitioners alike face the difficult task of selecting the most appropriate 
architecture for their specific applications. This becomes especially difficult when other critical requirements such 
as training time, inference time, or model size must be considered in addition to detection accuracy. For instance, 
in real-time applications, faster inference time may be prioritized, while in resource-constrained environments, 
smaller model sizes may be important. Therefore, researchers and practitioners need to conduct thorough 
evaluations of the various object detection models to make an informed decision. 


Previous research has compared different model architectures for different applications to aid in model selection. 
However, to the best of the authors' knowledge, there is no study that directly compares detection models for 
symbol detection in pixel-based drawings. Nevertheless, related publications have conducted comparison studies 
in various other domains. For instance, Brößner et al. (2022) compares nnUNet with the transformer-based Swin- 
UNet for bone segmentation in ultrasound images, finding that both networks are similarly applicable to the task. 
In another study, Wang et al. (2023) conducts a comparison for different backbones such as ResNet, Swin, and 
ViTAEv?2, by pre-training them on a large dataset of satellite images to later test them on different down-stream 
tasks, such as image segmentation or object detection. The authors discover that the transformer-based models 
show competitive results compared with the CNN models. In particular, ViTAEv2 achieves the highest 
performance on the different tasks. Moutik et al. (2023) compares several models based on CNNs, transformers, 
and hybrid approaches, to recognize human actions in video data. This study concludes that the CNN- and 
transformer-based models perform similarly, both with their strengths and weaknesses, but overall, the hybrid 
methods achieve the best results. 


Our research contributes to this broad research area by providing a comprehensive analysis of state-of-the-art 
object detection models and evaluating them based on several criteria for symbol detection in construction 
drawings. The remainder of the paper is structured as follows: Section 2 presents the methodological design of the 
comparative study conducted and details of the datasets used. Subsequently, in Section 3, we present and discuss 
the results of our comparative study, shedding light on the performance of different detection models. Finally, 
Section 4 summarizes the results and provides possible directions for further research in this area. 


2. METHODOLOGY 


Training 


Faster-RCNN with ResNet 
Faster-RCNN with Swin 


Eh ` Faster-RCNN with ConvNext 
Training object YOLOv5 


detection model YOLOv7 
YOLOv8& 


Synthetic 


training data Evaluation 
Accuracy 


Robustness 

Training Time 

Inference Time 
Testing data Model Size 


Fig. 1: Overview of the methodology used in the comparative study. 


The structure of this comparative study is shown in Fig. 1. Real data is collected and annotated to train and test the 
selected models. In addition, synthetic data is generated to extend the dataset. The models are trained with a 
combination of real and synthetic data, while only real data is used for testing. Section 2.1 provides a detailed 
description of the datasets. The comparison includes six different network architectures: YOLOvS (Jocher et al., 
2022), YOLOv7 (Wang et al., 2022), YOLOv8 (Jocher et al., 2023), and Faster R-CNN (Ren et al., 2017) with 
three different backbones namely ResNet (He et al., 2016), Swin (Liu et al., 2021), and ConvNeXt (Liu et al., 
2022). A brief introduction to the compared network architectures is presented in Section 2.2. In order to 
comprehensively assess the models, they are evaluated with respect to five criteria, i.e., accuracy, robustness, 
training time, inference time, and model size, which are explained in detail in Section 2.3. 
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SECTION C - Al, DATA SCIENCE AND ANALYTICS 


2.1 Datasets 


To obtain a meaningful comparison, the models are trained and tested on two distinct datasets representing different 
sub-sectors of the construction industry: bridge construction drawings and floor plans. The bridge-dataset 
comprises 15 real construction drawings with approximately 10.000 x 6.000 pixels dimensions. To make the large 
dimensions of the drawings manageable for the models, tiling is used to divide the drawings into patches. The 
drawings contain three types of symbols: section symbols, elevation markers, and dimension symbols (cf. Fig. 2). 
The size of these symbols varies from 20 to 300 pixels, which is relatively small compared to the overall size of 
the drawing. Consequently, the models must be able to produce meaningful feature representations even for small 
objects. To investigate this, the study explores different patch sizes to examine the influence of the symbol-to- 
image size ratio. Therefore, three patch sizes are considered: 256 pixels, 512 pixels, and 1024 pixels. This results 
in three data subsets named bridge 256, bridge _ 512, and bridge_1024, containing 2120, 1185, and 543 patches, 
respectively. 


All symbol classes of the bridge-dataset 


Section symbol Elevation marker Dimension symbol 


W_A ZN WV Sk KF 
Three symbol classes of the floorplan-dataset 


CH a 8 © apoB USE 
BOURA OEG BHAD 


Fig. 2: Example illustration of some representative symbols selected from the datasets. 


Since the number of patches is still quite limited, additional synthetic data is generated based on the procedure 
introduced by Faltin et al. (2022a). Lastly, to investigate the influence of the quantity of training data on the model's 
performance for the bridge-dataset, a reduced data subset called bridge_1024_red is created, which comprises only 
10% of the original data subset bridge_1024. Table 1 gives an overview of the composition of all datasets. 


Table 1: Overview of the composition of the data subsets. Synthetic data is only used for training and validation. 


Kinane No. synthetic No. validation No. synthetic No. testing 
No. training images eae. Lk . raor E ; 
training images images validation images images 
bridge 256 5611 1500 993 250 316 
bridge _512 3179 1500 544 250 192 
bridge 1024 1477 1500 250 250 94 
bridge_1024_red 130 170 25 25 94 
floorplan_1024 10294 0 1800 0 750 


On the other hand, the floorplan-dataset consists of 5000 fully annotated floor plans of residential buildings 
sourced from the CubiCasa-dataset (Kalervo et al., 2019). The majority of the drawings in this dataset are similarly 
sized ranging between 500 and 2000 pixels, therefore they are resized uniformly to 1024 pixels (floorplan_1024). 
The drawings distinguish eight different symbol classes: window, door, electrical appliance, toilet, sink, sauna 
bench, fireplace, and bathtub. Fig. 2 illustrates the high intra-class variation among these symbols, which requires 
effective generalization of the object detection models to handle the diverse styles. 


2.2 Object Detection Models 


In this study three single-stage detection models, namely YOLOv5, YOLOv7, and YOLOv8 are compared with 
three two-stage detection models. Each two-stage model modifies the Faster R-CNN architecture by replacing the 
backbone while keeping the region proposal network and detection head unaltered. The selected backbones are 
ResNet, ConvNeXt, and Swin. In this section, a short overview of the different architectures is given. 
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The YOLO model series is known for its high detection performance and fast inference speed. Both YOLOv7 and 
YOLOv8 are extensions of YOLOvS but incorporate different ideas such as anchorless detection and different 
convolutional blocks to enhance the base model’s performance. 


In comparison, ResNet is a well-established CNN-based image classification network commonly used as the 
backbone architecture for Faster R-CNN. On the other hand, the Swin backbone is a transformer-based architecture, 
building upon the concept of the visual transformer introduced by Dosovitskiy et al. (2020). Swin addresses the 
challenges of adapting transformers to the image domain by employing a hierarchical architecture with shifting 
attention windows. Lastly, ConvNeXt progressively modernizes the basic ResNet architecture by incorporating 
key components of visual transformers into the CNN-based architecture. 


For each of the chosen model architectures, the base size and the smallest version, referred to as tiny, are compared. 
This investigates whether equivalent results can be obtained with smaller model sizes. In general, larger models 
can extract more meaningful features due to their increased complexity, allowing them to handle challenging tasks 
while, in contrast, making them prone to overfit. The smallest versions are ResNet-50, Swin-Tiny, ConvNeXt- 
Tiny, YOLOvSn, YOLOv/7-tiny, and YOLOv8n, while the base-versions are ResNet-101, Swin-Base, ConvNeXt- 
Base, YOLOv5m, YOLOv7, and YOLOv8m. 


2.3 Evaluation 


To provide a comprehensive overview of the strengths and weaknesses of each model, they are evaluated based 
on five criteria: accuracy, robustness, training time, inference time, and model size. A detailed explanation of each 
criterium is given in the remainder of this section: 


2.3.1 Accuracy 


Accuracy measures the model’s ability to correctly locate and classify the symbols within the drawing. To assess 
the accuracy the standard metric of mean average precision (mAP) as defined by Padilla et al. (2021) is utilized, 
as it considers the precision and recall, evaluated at certain levels of intersection over union (IoU). While the 
precision indicates the proportion of accurate predictions made by the model among all the predictions, the recall 
denotes the proportion of correct predictions compared to the total number of ground-truth boxes available. Lastly, 
the IoU quantifies the overlap between a prediction and the ground-truth bounding box, measuring the model's 
localization accuracy. 


2.3.2 Robustness 


Obtaining and annotating extensive training data is time-consuming, resulting in data scarcity in many cases. 
Hence, it is highly desirable to have a robust object detection network, which means it performs reasonably well 
even with limited training data. The model is trained with 90% reduced training data to evaluate robustness. After 
training, the network's performance is tested with the full test dataset using the mAP metric. This enables an 
estimate of the accuracy decrease when training data is significantly limited. 


2.3.3 Training time 


Training time measures the physical time required for the model to reach a plateau, indicating that no further 
improvement is expected. This is particularly important in applications with limited computational resources or 
where the model is regularly retrained. While the training time may vary depending on the hardware used, in this 
study all models are trained on the same GPU to ensure comparability. 


2.3.4 Inference Time 


The inference time quantifies how long it takes for the model to make a single prediction, making it a vital metric 
for real-time applications. 


2.3.5 Model size 


Model size describes the memory requirements of the model and is a crucial factor that directly affects various 
aspects of model performance. Smaller models usually have a smaller number of parameters, leading to shorter 
inference times. However, extremely small models may not have the necessary complexity to handle the given 
task effectively. Therefore, it is important to carefully select the appropriate model size to achieve optimal 
performance. 
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3. RESULTS & DISCUSSION 


To perform the comparative study, the models are trained by running three Nvidia A100-SXM4-40GB GPUs. The 
two-level networks are available pre-trained solely on the ImageNet-1K dataset, which is a subset of the larger 
ImageNet dataset (Deng et al., 2009). Conversely, the YOLO models are available pre-trained exclusively on the 
MS COCO dataset (Lin et al., 2014). The AdamW (Loshchilov & Hutter, 2017) optimizer is employed with a batch 
size of 64 and an initial learning rate of 0.0001 for backpropagation. The maximum number of epochs is set to 400, 
but early stopping is utilized to prevent overfitting. In order to increase the overall quantity and diversity of the 
training data, data augmentation techniques such as flipping, rotation, scale, or translation are utilized for all data 
subsets. The results are presented and discussed in the following sections based on the different evaluation criteria. 


3.1 Results for Accuracy 


Table 2 shows the results of the comparative study focusing on symbol recognition accuracy. To evaluate the 
symbol detection performance with the mAP metric, an IoU threshold of 0.5 (mAP@0.50) is appropriate for most 
use cases. Nevertheless, the more stringent mAP@0.50:0.95 is also used in this study to evaluate the models more 
comprehensively. For the bridge dataset, ConvNeXt-B achieves the highest mAP@0.50 score of 0.996 on 512 x 
512 pixel patches, closely followed by ResNet-101, Swin-T, and Swin-B, all obtaining a score of 0.992 on 1024 x 
1024 pixel patches. This suggests that the symbol-to-image size ratio is not critical, as the models accurately detect 
smaller symbols even on larger patches. But on the contrary, performance generally declines with smaller patch 
sizes, with the lowest results observed on 256 x 256 pixel patches. This demonstrates the importance of context, 
as the models must consider each symbol’s surrounding space to locate and classify it properly. This is consistent 
with the observations of Lim et al. (2019), who similarly emphasized the importance of context when dealing with 
small objects. 


Table 2: Performance results of the detection models on the bridge- and floorplan-dataset. The bold fonts indicate 
the best results for the respective data subset, while the asterisk indicates the best results for the specific dataset. 


Bridge-dataset Floorplan-dataset 
256 x 256 Pixel 512 x 512 Pixel 1024 x 1024 Pixel 1024 x 1024 Pixel 
mAP So mAP}30:0.95 mAP So MAP§30:0.95 mAP So MAP? 30:0.95 mAP So MAP} 30:0.95 

ResNet-50 0.912 0.739 0.973 0.815 0.986 0.833 0.771 0.492 
ResNet-101 0.927 0.743 0.980 0.827 0.992 0.830 0.776 0.485 
Swin-T 0.945 0.769 0.961 0.801 0.992 0.846 0.789 0.517 
Swin-B 0.950 0.742 0.978 0.834 0.992 0.847 0.799 0.518 
ConvNeXt-T 0.934 0.763 0.992 0.862 0.991 0.855 0.795 0.527 
ConvNeXt-B 0.934 0.786 *0.996 0.864 0.991 0.862 0.806 0.524 
YOLOv5n 0.884 0.578 0.868 0.615 0.912 0.645 0.649 0.379 
YOLOv5m 0.886 0.704 0.951 0.799 0.973 0.838 0.761 0.510 
YOLOv/7-tiny 0.884 0.648 0.924 0.675 0.949 0.686 0.763 0.474 
YOLOv7 0.934 0.705 0.956 0.786 0.967 0.748 *0.816 0.551 
YOLOv8n 0.921 0.715 0.890 0.705 0.965 0.794 0.717 0.454 
YOLOv8m 0.938 0.814 0.965 0.854 0.982 0.891 0.805 0.570 


Regarding mAP@0.50:0.95, ConvNeXt-B still achieves the best results with 0.864. Conversely, YOLOvSn is the 
weakest performer in both mAP@0.50 and mAP@0.50:0.95, likely due to its lower model complexity affecting 
its ability to handle the complex symbol detection task. 
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For the floorplan-dataset, YOLOv7 leads by a significant margin (0.816), followed by ConvNeXt-B (0.806) and 
YOLOv8m (0.805). However, when considering mAP@0.50:0.95, YOLOv8m outperforms YOLOv7 with a 
notable gap. The general accuracy on the floor plans is lower compared to that of the bridge drawings. This 
discrepancy may be due to a larger number of symbol classes or a wider variety of symbol styles in the floorplan- 
dataset. 


Fig. 3: Predictions of the base models of Swin (left), ConvNeXt (center), and YOLOv8 (right) on a patch from 
the bridge-dataset (top) and the floorplan-dataset (bottom). 


Overall, larger and newer models perform better when accuracy matters. However, no significant difference in 
detection accuracy is seen between transformer-based and modern CNN-based networks. Despite that, the YOLO 
models tend to achieve lower mAP values than the two-stage models. This might be due to the YOLO model’s 
smaller size. One exception is the YOLOv8m model, which achieves the highest values of all YOLO models and 
sometimes even outperforms the two-stage models. 


3.2 Results for Robustness 


The results for the reduced training dataset are presented in Table 3. While ConvNeXt-B shows the highest mAP 
score on the full training dataset, its performance declined by 25% when trained on the reduced dataset. It is also 
noteworthy that YOLOv5n, which already performed the worst, showed the largest decrease with 51%. On the 
other hand, YOLOv7 is quite robust to reduced training data, showing an 18% drop with mAP@0.50:0.95 score 
of 0.614. It is followed by the second most robust network ResNet-101 with a decrease of 26%. 


3.3 Results for Training Time, Inference Time & Model Size 


Table 3 shows the training time, inference time, and model size of the compared object recognition networks. 
Among them, YOLOv8n has the shortest training time with 0.51 hours on average, while still achieving promising 
results, as shown in Table 2. On the other hand, the transformer-based Swin-B network requires the longest training 
time. This is probably due to the computationally intensive attention mechanism used by the Swin network. In 
terms of inference time, the YOLO models demonstrate superior performance compared with the two-stage models. 
Specifically, YOLOv7-tiny achieves the fastest inference time, averaging 1.6 milliseconds per image, followed by 
YOLOv?7 with 6.4 milliseconds per image. Surprisingly, despite its smaller network size, YOLOvS5n does not 
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display the fastest inference time. It is worth noting that even the largest network, ConvNeXt-B, has a relatively 
small memory requirement of 400 MB. 


Table 3: Performance results on the reduced bridge-dataset as well as training time, inference time, and model size. 
The bold fonts indicate the best results for the respective column. 


bridge_1024_red 


mAPI ot, Performance Training time in hrs. Inference in riba on Model size in MB 
decrease averaged 1024 x 1024 pixels 

ResNet-50 0.590 -31% 2.58 52.9 157.97 
ResNet-101 0.611 -26% 3.21 55.3 230.62 
Swin-T 0.635 -25% 5.19 70.4 170.95 
Swin-B 0.602 -29% 7.52 109.9 397.56 
ConvNeXt-T 0.633 -26% 4.20 64.5 171.88 
ConvNeXt-B 0.645 -25% 5.51 104.2 400.27 

YOLOvS5n 0.319 -51% 0.82 12.9 7.12 
YOLOv5m 0.541 -35% 2.67 17.8 80.77 
YOLOv/7-tiny 0.491 -28% 1.12 1.6 22.94 
YOLOv7 0.614 -18% 3.15 6.4 139.21 
YOLOv8n 0.570 -28% 0.51 12.9 11.49 
YOLOv8m 0.672 -25% 2.49 18.3 98.65 


In summary, it is important to find a balance between model size and accuracy. Smaller models may perform worse 
and models that are too large may overfit on the training data. Therefore, when choosing an appropriate model 
size, an optimal trade-off between model complexity and performance must be made based on the specific 
requirements of the application. 


4. CONCLUSION 


This paper provides a comprehensive comparison of six object detection models employed for the task of symbol 
detection in pixel-based construction drawings. The evaluated models include YOLOv5S, YOLOv7, YOLOv8, and 
Faster R-CNN, equipped with three different backbones, i.e., ResNet, Swin, and ConvNeXt. For each architecture 
the smallest and baseline model size are employed, resulting in a total of twelve compared models. The evaluation 
is conducted along five key criteria, namely accuracy, robustness, training time, inference time, and model size. 
This offers valuable insights to guide the selection of appropriate models for diverse use cases and their specific 
requirements. The models are trained and tested on bridge construction drawings as well as floor plans representing 
two common sub-sectors of the construction industry. 


Among the evaluated models, ConvNeXt shows the highest accuracy in the bridge-dataset, which makes it the 
best choice for detecting small symbols with low variance. On the other hand, when a high variance of symbol 
style is present, YOLOv7 or YOLOv8 are more suitable choices. Notably, YOLOv7 also proves robust when 
trained with reduced data, therefore making it superior to YOLOv8. Overall, the YOLO models generally have 
lower training and inference times, which can be directly linked to their smaller model size. Overall, the more 
recent models, including YOLOv8, Swin, and ConvNeXt, show comparably high accuracy, suggesting that 
significant improvements in the future will be limited. Nevertheless, promising research directions such as self- 
supervised learning (Jaiswal et al., 2021) and active learning (Schmidt et al., 2020) are expected to advance training 
efficiency by reducing the amount of training data required or the manual annotation effort. Moreover, expanding 
upon the proposal by Elyan et al. (2020a) for generating synthetic data through deep learning models, employing 
new methods such as stable diffusion (Rombach et al., 2022), could improve detection results even more, 
mitigating the scarcity of real data. 


The results of our study provide valuable insights for the future development of symbol detection research, 
enabling further exploration of model performance on other technical drawing types, such as P&IDs. Investigating 
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hybrid models that combine transformer- and CNN-based architectures represents another promising direction for 
future research. 
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ABSTRACT: There is rising demand for automated digital twin construction based on point cloud scans, 
especially in the domain of industrial facilities. Yet, current automation approaches focus almost exclusively on 
geometric modelling. The output of these methods is a disjoint cluster of individual elements, while element 
relationships are ignored. This research demonstrates the feasibility of adopting Graph Neural Networks (GNN) 
for automated detection of connectivity relationships between elements in industrial facility scans. We propose a 
novel method which represents elements and relationships as graph nodes and edges respectively. Element 
geometry is encoded into graph node features. This allows relationship inference to be modelled as a graph link 
prediction task. We thereby demonstrate that connectivity relationships can be learned from existing design files, 
without requiring domain specific, hand-coded rules, or manual annotations. Preliminary results show that our 
method performs successfully on a synthetic point cloud testset generated from design files with a 0.64 F1 score. 
We further demonstrate that the method adapts to occluded real-world scans. The method can be further extended 
with the introduction of more descriptive node features. Additionally, we present tools for relationship annotation 
and visualisation to aid relationship detection. 


KEYWORDS: BIM, Digital twin, GNN, machine learning 
1. INTRODUCTION 


Ageing industrial facilities often lack essential documentation, resulting in sub-optimal maintenance and 
breakdowns. Digital twins remedy this and assist in the operation and maintenance of industrial facilities. 
However, generating twins for existing facilities is a laborious and time-intensive process that outweighs the 
perceived benefits offered by the twins. (Agapaki et al., 2018). While this has resulted in significant interest in 
automation, current approaches merely segment elements and model their geometry. However, industrial facilities 
are composed of a vast number of interconnected elements of various categories; thus, the identification of their 
connectivity relationships is a crucial, yet challenging step in the digitisation process. 


The prevalent methodology for constructing geometric DTs from existing facilities, known as “Scan-to-BIM” 
(Tang et al., 2010) consists of the following steps: (1) raw data collection, (2) data preparation, (3) geometric 
modelling, and (4) semantic enrichment of the model. The focus of this paper is the final step. 


Semantic enrichment’ refers to the incorporation of various forms of additional information into a digital twin to 
enhance its value. Some common examples are element relationships, material information, damages to elements, 
and code compliance information. There are various types of ‘element relationships’ within industrial facilities. 
Tang et al (Tang et al., 2010) identifies three commonly found relationship types. Namely, topological 
relationships (e.g., a pipe being connected to an elbow), aggregation relationships (e.g., a pipe being within the 
HVAC (Heating, Ventilation and Air Conditioning) system) and containment relationships (e.g., a window 
belonging to a wall). This paper focuses on topological relationships. 


Topological relationships between various elements of a facility are a key component of its documentation. For 
instance, when diagnosing faults within building systems (Tang et al., 2010), carrying out maintenance tasks, or 
checking for code compliance (Bloch & Sacks, 2020), topological relationships must be identified in advance. 
They are also crucial when analysing sub-systems within a plant, which assists maintenance and monitoring. 


A variety of industry tools such as Trimble RealWorks, Leica Cyclone, ClearEdge3D EdgeWise are used in the 
DT construction process. Currently, element relationships modelling between elements is largely a manual process 
with industry tools providing limited automation. Tools such as EdgeWise can connect adjacent pipes, but not 
other piping elements. AVEVA E3D and PointSense offer the ability to derive pipe branches but require user 
guidance through point picking. A feature comparison of popular tools is given in table 1 (Son et al., 2015). The 
requirement of manual guidance throughout the modelling process is one of the primary pitfalls of current software 
solutions, necessitating significant expert labour. 
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Table 1. Comparison of automation features and pipe modelling functions (SA=Semi-automated, FA=Fully- 
automated, NA=Not-available, Source: (Son et al., 2015) 


Task Trimble RealWorks Leica Cyclone ClearEdge3D EdgeWise 

Automation features 

Pipe Detection SA FA FA 

Part Recognition (e.g., elbows, tees) FA NA FA 

Model Creation FA FA FA 

Pipe modelling functions 

Straight pipe FA FA FA 

Elbow NA NA FA 

Tees NA NA FA 


This paper proposes a novel method for automatically identifying topological relationships between elements 
within industrial facility models using GNNs. We demonstrate our model’s performance both on synthetic and 
real-word data. Furthermore, we present tools for relationship visualisation and annotation to aid in the 
relationship detection process. Crucially, this research demonstrates the applicability of graph inference for 
element relationship inference in BIM. 


2. BACKGROUND 


Current literature on industrial facility DT construction predominantly encompasses the first three steps of the 
Scan-to-BIM process. Annotated point cloud datasets such as CLOI contains elements from classes such as 
Channels, Valves, I-beams, Flanges, Elbows, Cylinders and Angles (Agapaki et al., 2019). A prominent instance 
segmentation method utilizes CLOI-NET, a modified PointNet++ based neural network to identify element point 
clusters using the above dataset. This achieves 73.2% mean precision and 71.1% mean recall over all classes. 
However, results for more complex shapes such as flanges are considerably lower, especially in the presence of 
occlusions. Another approach proposes ResPoint++, which uses an encoder-decoder structure trained on I-Beams, 
R-Beams, pumps, pipes, and tanks (Yin et al., 2021). Xie et al proposes PipeNet for modelling straight pipes via 
centreline prediction (Xie et al., 2023). Once element point clusters are retrieved, they are geometrically modelled 
with approaches such as CAD model matching (Agapaki & Brilakis, 2022). However, the final semantic 
segmentation step, particularly in the form of relationship inference is yet unsolved, in industrial facilities as well 
as in other domains. Moreover, scan datasets with annotated relationships currently do not exist. 


Traditionally, element relationships are defined with various data schemas such as IFC. However, graph 
representations of IFC models have recently become prominent due to their ability to query information more 
effectively. In a graph representation, each element is represented by a graph node. Relationships between 
elements are depicted by edges. Graphs are well suited for the representation of building information as both 
spatial and non-spatial information can be stored as node or edge properties within a graph (Ismail et al., 2018). 


Previous attempts at automated detection of element relationships rely on hard coded rules. Nguyen et al. proposes 
an algorithmic approach to inferring relationships such as adjacency, containment, intersection, and connectivity 
from CAD models of elements (Nguyen et al., 2005). This requires complete CAD models without occlusions 
and is constrained by many assumptions. Another approach infers connecting tees and elbows based on pipe 
centrelines to predict pipelines (Oh & Kwang, 2021). Hard coded rules are unique to their domain. Thus, these 
approaches cannot scale to various domains, and are limited to a few common scenarios. To our knowledge, no 
published work exists that attempts to derive relationship information between various elements a laser scan. 


Such a task would require an understanding of the nature of element relationships within an environment. For 
instance, the existence of a pipe and an elbow in proximity and in alignment suggests that the two elements are 
linked. There is a diverse and non-exhaustive set of such instances where relationships can be inferred, especially 
in the presence of occlusions and barriers such as walls. Furthermore, these vary between domains; the types of 
relationships in a bridge are vastly different from those in an industrial facility. Thus, rule-based approaches to 
relationship inference tend to perform poorly. We posit that a method of automatically learning the nature of 
relationships in a particular domain is better suited for this use case. In particular, we focus on GNNs, which are 
geometric deep learning approaches capable of learning directly from graphs. 


GNN architectures can be split into spatial and spectral GNNs. Spatial GNNs such as GraphSAGE (Hamilton et 


al., 2017) and Graph attention Networks (GAN) (Veličković et al., 2017) create vector embeddings of graph nodes 
and aggregate features of adjacent nodes. In contrast, spectral GNNs such as Graph Convolution Networks (GCN) 
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(Kipf & Welling, 2016) are based on graph message passing. Other influencing factors include size of the graph 
and type of task. Tasks can be broadly categorized into three types: node classification, link prediction and graph 
classification. The choice of architecture for a particular task is influenced by a variety of factors. For instance, 
GraphSAGE and GCN are by default suited for graphs without edge features. Furthermore, they both behave as 
inductive frameworks, allowing them to scale to nodes that are unseen during the training process. However, 
inductive frameworks are unable to perform prediction on a completely edgeless new node, as information cannot 
be propagated in the absence of edges. Methods such as Edgeless-GNN (Shin et al., 2021) address this 
shortcoming by introducing pseudo edges based on similarities in node features. These edges ensure message 
propagation to new nodes. 


There are very few applications of GNNs in the buildings domain. Buruzs et al. and Wang et al. both utilize graph 
representations of IFC models for the task of room type classification. They utilize a GCN and a modified version 
of GraphSAGE architecture respectively (Buruzs et al., 2022; Wang et al., 2022). They both model the task as a 
node classification problem and focus on indoor living spaces. Some of the edge features used within the graph 
representation include type of connection (e.g., Door vs wall) and material of connection (wooden vs metal door), 
while some of the node features include volume, height, oriented bounding box dimensions etc. The above 
methods demonstrate the suitability of graph learning in BIM. 


The above findings demonstrate that we do not yet know how to automatically detect relationships between 
industrial facility elements. We merely know how to identify individual elements in isolation, but a digital twin 
should represent the connectivity and interactions of the system. The aim of this work is to address this gap in 
knowledge by answering the research questions; (a) Which strategy to utilize for automated inference of 
topological element relationships with high precision and recall? And (b) How to train an element relationship 
inference model in the absence of annotated relationship datasets? 


3. PROPOSED SOLUTION 


We propose a method for automated topological relationship detection between elements. The scope of this 
research is limited to cylinders, elbows, tees, and flanges. These elements account for a majority of the modelling 
workload (Agapaki et al., 2018). 


Segmented element point clusters of existing industrial facilities are the input data source. These individual point 
clusters are extracted from a scan using an existing instance segmentation method such as CLOI-NET. Elements 
and their relationships are represented in the form of a graph. Each element is modelled as a graph node, and its 
geometric features and element class are encoded into a graph node feature vector. Relationships are represented 
by edges. Relationship detection is modelled as an edge prediction task and a GNN is trained for this purpose. 
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Figure 1. Overview of the proposed solution 


Element geometry information is encoded into graph node features. These features are used by the GNN to learn 
the influence of element geometry on connectivity between elements. A more descriptive node feature provides 
additional information to the GNN. Possible features include: (a) minimum oriented bounding boxes (MOBB) of 
elements, (b) element-specific parameters such as axis, radius etc., (c) sampled subset of points, and (d) learned 
feature vectors. Oriented bounding boxes can be easily derived but are limited to crude information regarding 
element geometry and position. These are heavily affected by outliers and errors in instance segmentation. In 
contrast, element specific features are more difficult to extract from point clouds, but are more descriptive, 
especially for shapes such as elbows. They can represent geometries accurately with few parameters. Examples 
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include element position (e.g., centre point), orientation (e.g., cylinder axis) and element geometry (e.g., radius, 
length). Element parameters of simpler shapes such as cylinders may be extracted using methods such as 
RANSAC, but more complex shapes such as elbows can prove challenging. Yet another approach would be to 
encode the element point cluster into a feature vector. Such an encoding may be generated automatically by using 
a deep learning approach such as PointNet (Qi et al., 2016) and would theoretically be capable of robustly 
representing the element. 


Manual annotation of a relationship dataset to train the GNN would require a significant amount of time and 
labour. Thus, we generate training data from design files which contain element relationship information. In terms 
of GNN architecture, we opt for GraphSAGE, as it is (a) capable of inductive learning, (b) suited for link 
prediction, (c) scales linearly in the number of graph edges and (d) has been successfully utilised for previous 
BIM applications (Wang et al. 2022). 


4. RESEARCH METHODOLOGY 


This research is designed upon the assumption that topological relationships between elements can be inferred 
from their geometric features, and that the nature of such relationships can be learned by a neural network. We 
further assume that the data loss from compressing point cloud instances to node features does not significantly 
affect the ability of a GNN to identify element relationships. 


The dataset used for training is composed of design files for an offshore Liquid Natural Gas hub in NavisWorks 
format. The subset utilized for experimentation comprises of two sub sections of the site containing around 37,000 
elements with around 31,000 unique topological relationships (Figure 2). Element relationships were extracted 
using a python script through the NavisPythonShell plug-in and geometries were extracted by the NavisTools 
plug-in. 


4177 


, = A 
4a,: SS — 


Figure 2. A subsection of the input dataset (left) and element relationship frequencies (right). 


Out of the proposed node feature representations, we test the bounding box and sampled subset of points strategies. 
The relative simplicity of these methods provides a starting point for assessing the feasibility of our solution. For 
the bounding box representation, we calculate the principal axis vector, centre-point, and dimensions of the 
minimum oriented bounding box. For the latter representation, we randomly sample 100, 500, and 1000 points 
for each element (Figure 3). 
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Figure 3. Node features derived from bounding box (left) and Element points sampled at 1000, 500, and 100 
points (right) 
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We experimentally determine that node features derived from bounding box geometry provide best results. Our 
graph representation consists of a graph whose nodes represent individual elements, and whose edges denote 
topological relationships between the elements. Each node is represented by a feature vector consisting of the 
above bounding box parameters. Specifically, the principal axis is represented by a 3D unit vector, and centre 
point and dimensions are also represented by 3D vectors. The class label of the element is appended to the feature 
vector using a one-hot encoding. 


The link prediction GNN and graph dataset were implemented using PyTorch and Deep Graph Library. The 
training/validation set consisted of around 20,000 elements and 17,000 topological relationships. 10% of this set 
was reserved for validation. Furthermore, a separate, disjoint section of the design file dataset was used as a test- 
set, ensuring that the trained model scales to unseen graphs. The test-set comprised of around 17,000 elements 
and 15,000 relationships. For training, we generate a positive graph containing all edges in the training set, and a 
negative graph containing the inverse of those edges. As each element could potentially be connected to every 
other element, the negative graph contains an exponentially large number of edges. Therefore, we sample a subset 
of edges to create a balanced training set. The sampling is performed dynamically during training to include all 
potential negative edges. Furthermore, we restrict our search to potential edges between elements within a pre- 
defined range and generate pseudo edges based on physical proximity between elements. 


Our proposed GNN is based on the GraphSAGE architecture and contains 2 GraphSAGE layers for node feature 
aggregation, as well as a 2-layer Multi-Layer Perceptron (MLP) for edge feature computation. The node features 
act as input for the 1 layer of the GNN. Ina single layer, features of each nodes’ neighbours are aggregated using 
mean aggregation, and combined with its previous features and a trainable weight vector. Next, edge features are 
computed to calculate probability of a link. We evaluate two prominent edge feature predictors, namely dot 
product and MLP, and determine that an MLP with two layers and a Rectified Linear Unit (ReLU) activation 
function yields best results. The model is trained using Binary Cross Entropy loss. 


5. RESULTS AND DISCUSSION 


Performance comparisons of hyperparameters are given in table 2. Training losses failed to converge when 
training with sampled points as node features. Thus, all results listed in the table utilize bounding box parameters 
and element class label as node features. All models were trained for 500 epochs, with Adam optimizer and a 
learning rate of 0.01. 


Overall, the system achieves a recall of 0.88, precision of 0.51 and F1 score of 0.64 on the test-set at 0.5 
classification threshold. A breakdown of model performance on relationships between various element types is 
given in table 3. The drop in both precision and recall is primarily due to errors in pipe-pipe relationships, 
stemming from disjoint piping elements. 


Table 2. Performance comparisons between various hyperparameters on validation set 


Precision Recall F1 score AUC-ROC 

Node representation update method (with MLP — 2 layers) 

1 SageGRAPH layer 0.967 0.771 0.858 0.879 

2 SageGRAPH layers 0.980 0.738 0.842 0.967 

3 SageGRAPH layers 0.976 0.737 0.840 0.924 
Edge feature computation method (with SageGRAPH — 2 layers) 

Dot product 0.700 0.955 0.806 0.900 

1 MLP layer 0.486 0.677 0.566 0.451 

3 MLP layers 0.937 0.736 0.825 0.855 


We also test model adaptability to real world data by testing on a manually annotated subset of the CLOI dataset 
containing flanges, elbows and pipes. Tees were excluded due to their low prevalence. The dataset contains around 
600 relationships between around 1100 elements. We develop a new relationship annotation and visualisation tool 
based on LabelCloud, an open-source point cloud bounding box annotation tool (Sager et al., 2021) for this 
purpose. The model achieves a recall of 0.99, precision of 0.75 and F1 score of 0.85 on this dataset. The higher 
performance is a result of the dataset being less complex than the design dataset with less densely packed elements. 
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Table 3. Recall and precision by element type, in design file (top) and CLOI (bottom) testsets 


Precision Flange Elbow __ Tee Pipe Recall ___ Flange _ Elbow _ Tee Pipe 
Flange 0.74 0.94 0.88 0.93 Flange 1.0 0.98 1.0 0.97 
Elbow 0.89 0.60 0.96 Elbow 0.97 0.97 0.94 
Tee 0.94 0.71 Tee 1.0 0.94 
Pipe 0.24 Pipe 0.76 
Precision (CLOI) Flange Elbow _ Pipe Recall (CLOI Flange Elbow Pipe 
Flange 0.25 0.93 0.85 Flange 1.0 1.0 1.0 

Elbow 0.14 0.92 Elbow 1.0 0.98 
Pipe 0.70 Pipe 1.0 


Additionally, we visualise results by programmatically generating cylindrical elements at connection points to 
denote topographical relationships (Figure 4). Most false positives and false negatives occur on elements with 
smaller radii. In particular, the model predicts false edges on small parallel pipes. Parallel pipes connected via two 
elbows are another cause of false positives. The model is also more likely to miss relationships in densely packed 
regions. Notably, errors are more prevalent among smaller connecting elements such as tees. This may be 
attributed to shortcomings of the node feature representation. Unlike pipes, smaller elements such as tees or 
elements do not have an unambiguous principal axis. Therefore, a bounding box representation may be inadequate 
to represent their geometry. Switching to more descriptive features such as element parameters may prove to be 
a valuable avenue for future work. Visual analysis of CLOI dataset results (figure 5) demonstrates that errors are 
primarily false positives mainly caused by noisy point clusters. This is explained by the high sensitivity of the 
bounding box representation to noisy points. 
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Figure 4. Model predictions (left), and failure cases (right) on design file testset and predictions on CLOI dataset 
(bottom) (True positive=Green, False positive=Yellow, False negative=Red) 
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Figure 5. predictions on CLOI dataset. Edges are denoted by lines between edges of element bounding boxes 
(True positive=Green, False positive=Yellow, False negative=Red) 


6. CONCLUSION 


We propose a novel method for automatically identifying topological relationships between elements within 
industrial facility models using GNNs. Specifically, this research is the first to accomplish this task without a rule- 
based approach. While significant improvements are required to match the precision and recall of human 
annotators, our method demonstrates the feasibility of automated relationship inference, and can be used as 
guidance for annotators. Many failure cases are caused by limitations of the bounding box node representation. 
These include sensitivity to noisy points and inability to represent geometry of smaller elements accurately in 
densely packed areas. Thus, there is significant potential for future improvement by substituting a more advanced 
representations such as element parameters or a learned feature representation. Another limitation of the proposed 
method is low performance in the presence of occluded points, which are common in large scale indoor scans. 


The method can also be extended to detection of aggregation relationships such as facility subsystems. 
Furthermore, the inferred relationship information may also be utilised as additional context to improve instance 
segmentation performance. Crucially, in contrast with previous hand-coded approaches in various domains, this 
paper presents an automated alternative to relationship inference, which is crucial to the semantic enrichment step 
of the scan-to-BIM process. It is thus suited for more complex scenarios, and is easily adaptable to other domains, 
making digital twinning more accessible to previously unexplored infrastructure domains. 
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ABSTRACT: Trajectory reconstruction of pedestrian is of paramount importance to understand crowd dynamics 
and human movement pattern, which will provide insights to improve building design, facility management and 
route planning. Camera-based tracking methods have been widely explored with the rapid development of deep 
learning techniques. When moving to indoor environment, many challenges occur, including occlusions, complex 
environments and limited camera placement and coverage. Therefore, we propose a novel indoor trajectory 
reconstruction method using building information modeling (BIM) and graph neural network (GNN). A spatial 
graph representation is proposed for indoor environment to capture the spatial relationships of indoor areas and 
monitoring points. Closed circuit television (CCTV) system is integrated with BIM model through camera 
registration. Pedestrian simulation is conducted based on the BIM model to simulate the pedestrian movement in 
the considered indoor environment. The simulation results are embedded into the spatial graph for training of 
GNN. The indoor trajectory reconstruction is implemented as GNN conducts edge classification on the spatial 


graph. 


KEYWORDS: Indoor trajectory reconstruction; Graph neural network, Building information modeling; Camera- 
based tracking; Spatial graph; Pedestrian simulation 


1. INTRODUCTION 


Indoor trajectory reconstruction refers to the process of estimating the path or trajectory followed by a moving 
object or person within an indoor environment. This can be useful in various applications, such as indoor 
navigation, activity recognition, or monitoring systems. There are different approaches to indoor trajectory 
reconstruction, including sensor-based methods and computer vision (CV) techniques. Sensor-based methods rely 
on sensors, such as accelerometers, gyroscopes, magnetometers, or depth sensors, to track the movement of an 
object or person. The sensor data is processed using techniques like sensor fusion or Kalman filtering to estimate 
the trajectory (Patron-Perez et al. 2015). This approach is commonly used in devices like smartphones or wearable 
devices. Wi-Fi or Bluetooth signals can also be used to estimate the location of a device within an indoor 
environment (Traunmueller et al. 2018). By measuring the signal strength from different access points or beacons, 
it is possible to determine the approximate position of the device. Trajectory reconstruction can be achieved by 
tracking the device's movements over time using signal strength variations. However, as people pay more attention 
to privacy, these methods have become more controversial and inconvenient, since it requires pedestrians to 
actively upload signals. CV techniques can be employed to reconstruct trajectories using visual information 
captured by cameras or depth sensors (Wong et al. 2022). These methods may involve object detection, tracking, 
and motion estimation algorithms. For example, by tracking the position of a person in multiple frames of a video, 
it is possible to reconstruct their trajectory within the indoor environment. In this regard, CV techniques are more 
acceptable for public as it required less information exposure. 


Person re-identification (ReID) is a CV task that involves identifying and tracking individuals across different 
cameras or video frames (Zheng et al. 2015). The goal of ReID is to match a person's identity across non- 
overlapping camera views or at different points in time within a video sequence. In scenarios such as closed circuit 
television (CCTV) systems, where multiple cameras are installed in an area, ReID can help track individuals as 
they move between camera views. It is particularly useful in crowded or complex environments where traditional 
tracking methods may fail due to occlusion or changes in appearance. ReID applies deep learning techniques such 
as convolutional neural networks (CNNs) to extract discriminative features from the person's appearance (Cheng 
et al. 2016), which are further compared among different individuals to match the same person’s features while 
differentiating them from others. However, ReID performs differently indoors and outdoors, as they differ 
significantly in terms of lighting conditions, camera placement, and occlusion. Indoor environments often have 
controlled lighting and less occlusion of pedestrian, which can result in more consistent appearance of individuals, 
thereby ReID algorithms usually could achieve better performance. However, indoor cameras are often installed 
at fixed positions with controlled angles and have narrow fields of views and limited camera coverage due to the 
narrow space of indoor environment and occlusion of building elements. It is not realistic to install cameras to 
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cover all indoor space as it requires large investment and maintenance cost in CCTV system. Hence, other 
techniques are required to enable indoor trajectory reconstruction. 


Building information modeling (BIM) is a powerful tool for building management and provides a common data 
environment to connect different platforms to support various applications (Song et al. 2022; Cheng et al. 2022). 
Integration of BIM and CV techniques has emerged as a hot topic in recent years and unlocks a lot of applications. 
For example, construction activities can be recognized and analyzed by CV algorithms, and the relevant 
information can be extracted and intergraded into BIM model to improve the construction progress monitoring 
(Deng et al. 2020; Braun et al. 2020). Furthermore, information about the building components can also be 
intergraded in the digital representation of the BIM model for automatic detection and identification (Troncoso- 
Pastoriza, et al. 2018). 


Graph neural network (GNN) is a type of deep learning model that is specifically designed to operate on 
unstructured data that can be represented as graphs (Zhou et al. 2020). The key idea behind GNNs is to propagate 
information across the nodes and edges of a graph, allowing each node to gather and update information from its 
neighbors. GNNs have shown promising results in various domains, including social network analysis, molecular 
chemistry, recommendation systems, and CV tasks involving graphs or structured data (Wu et al. 2020). In recent 
years, GNNs have been applied to the building design and management. Nauata et al. (2020) applied GNN to 
generate house layout following given relational architecture. Cheng et al. (2022) leveraged GNN to conduct 
crowd prediction in the building. In this regard, GNN is a potential technique to assist the indoor trajectory 
reconstruction as the building layout can be represented as a graph and can be further processed by GNNs. 


This paper proposed an indoor trajectory reconstruction method using BIM and GNN. A spatial graph is proposed 
to depict the indoor environment. Pedestrian simulation is conducted using the agent-based model established 
based on BIM model to enrich the spatial graph. The CCTV system is integrated with BIM model by camera 
registration, so that the information generated by CV algorithms based on the cameras’ videos can be related to 
specific location in the BIM model. With the information from CCTV system, the spatial graph is then processed 
by a GNN to reconstruct the indoor trajectory of pedestrian. Section 2 introduces the methodology in details, while 
Section 3 provides an example to illustrate the proposed method. 


2. METHODOLOGY 


The proposed framework of indoor trajectory reconstruction is shown in Fig. 1. The BIM model is first used to 
establish a spatial graph to describe the spatial relationship among the indoor spaces including corridors and rooms. 
To be specific, the floor plan of the Revit model is analyzed by a Dynamo algorithm to identify the entrances, exits, 
intersections and dead ends. These points would be nodes in the graph and corridors connecting these nodes would 
be edges. Besides, DWG file is exported from BIM model and imported into a pedestrian simulation software 
called AnyLogic. The movement of pedestrian inside the building is simulated. The required time for a person to 
move from one point to others is recorded and embedded into the graph as edge attribute. The CCTV layouts are 
linked to the BIM model by camera registration so that the identification of some person in the field of view of 
one camera can provide information of the location of the person in the building. ReID algorithm is adopted to 
identify a specific person across several cameras. The series of timestamps and positions of one person will be 
passed to the spatial graph and possessed by a graph neural network to identify the trajectory of this person. 


2.1 Integration of BIM and CCTV System 


Cameras have been widely used in buildings for safety and efficiency surveillance. Especially with the integration 
of artificial intelligence and building BIM, many intelligent applications have emerged. To unlock the potential of 
CCTV-BIM integration, the first step is to localize the cameras in the considered environment, based on which the 
event or person identified in cameras can be linked to specific position in the BIM model. 


2.1.1 Camera Registration with BIM 


Camera registration, also known as camera pose estimation, is the process of relating the camera coordinates to 
the real-world coordinates of objects or scenes. Some previous studies leverage conformity of geometric primitives 
such as points, lines, and planes to determine the translation, rotation, and scale in reference to as-planned models 
or real world (Asadi et al. 2019). These methods usually require manual operation and rely on predefined viewpoint 
assumption including camera position and orientation (Lukins and Trucco 2007; Rebolj et al. 2008). Asadi et al. 
(2019) automated the registration process by performing an augmented monocular simultaneous localization and 
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mapping and perspective detecting and matching between the image frames and their corresponding BIM views. 
Though automated methods are more efficient, manual approaches with low technical threshold are still common 
as the camera registration process is one-off for a scene as long as the cameras are fixed. 


Fig. 2 shows an example of manual camera registration with BIM model. The position and rough orientation of 
the camera are provided, so that the objects such as doors and walls in the field of view (FOV) can be easily 
mapped to the building elements in the BIM model. Several characteristic points at the intersection line of wall 
and floor, such as corner points of walls, will be selected in the FOV, while their correspondences will be identified 
in the BIM model. The function for transforming the pixel coordinates in camera’s FOV to global coordinates in 
the BIM model can further be established. For example, for the FOV in Fig.2, two characteristic points C4 (Ci) 
and C,(C3) are selected, for any point C(x», Yp) in the FOV, its corresponding point Cp can be determined by 
following equations. 


, _ pT% rat $ 
P T ge ae (x3 xi) +x (1) 
Yp T Yı 
Noes (y; — N+ 1 2 
Yp E Y2 — Yı) + yı (2) 


The above equation can only be used for cameras with no distortion, otherwise some corrections are needed. When 
more than two characteristic points are identified, the coordinates can take the average of the calculation results of 
every pair of the points using the above equation. For a FOV where characteristic points could not be found, several 
markers with known global coordinates can be set on the floor, which can be easily identify in the camera’s FOV 
to establish the transformation function. 


Camera 
registration 


Movement 
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direction 


Time consumption 
Within-camera 
tracklets 
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Indoor Trajectory 
Reconstruction 


Fig. 1: Proposed framework of indoor trajectory reconstruction 


Camera registration allows for accurate transformation between the real-world coordinates and the 2D coordinates 
in camera. It is crucial in multi-camera systems, where multiple cameras are used to capture a scene from different 
viewpoints. By accurately registering the cameras, it becomes possible to merge or fuse the information from 
different cameras and create a consistent and comprehensive representation of the scene. Overall, camera 
registration is a fundamental step in computer vision applications that involve cameras, enabling precise mapping 
between the real world and the image plane, and facilitating accurate measurements and analysis of the captured 
visual data. 
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Fig. 2: Example of camera registration with BIM 
2.1.2 Real-to-BIM 


With camera registration, the information captured by CCTV system can be reflected into BIM. By adopting CV 
techniques, pedestrian can be detected with a bounding box. It is assumed that the midpoint of the bottom line of 
the bounding box can roughly represent the location of a person if the detection module in Section 2.3.1 reaches 
certain accuracy. Based on this, the movement direction of a person can be identified, which can be further matched 
with the directions of different branches of corridors to estimate the trajectory of the person. Besides, the speed of 
the person can be further calculated, as the movement distance of the person in real world can be achieved through 
transformation and the time consumed is also known. 


As some cameras will be installed outside some rooms, the videos from cameras can be used to estimate whether 
a person has entered a room. Firstly, necessary information is extracted from BIM model using Dynamo, including 
door’s dimension, door’s location, and room’s name. The location of a door is simplified as a segment AB on the 
2D plane, while a person is abstracted as a point P. There are three cases based on the relative position of the line 
segment and the point, shown in Fig. 3. Based on this, we develop the algorithm to detect, when a person disappears 
from the video, the room he/she enters, or whether the person leaves, as shown in Fig. 4. For each point, we repeat 
such a process m times to collect the distances between this point and m line segments (doors), then regard the 
line segment that has the smallest distance as the final result. In other words, the closest door a person nears when 
he/she disappears is identified as the one the person enters. 


P p P 
\ | \ 
| l | 
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Fig. 3: Three cases for the shortest distance between a point and a line segment 
2.2 Spatial Graph 


Spatial graph is proposed to represent the indoor environment based on the BIM model. Besides, pedestrian 
simulation is conducted using agent-based model, which is derived from BIM model. The simulation results are 
then embedded in the spatial graph. With the information from CCTV system, the indoor trajectory can be 
reconstructed based on the spatial graph using GNN. 


2.2.1 Graph Construction 


Spatial graph is proposed by improving the medial axis transform (MAT) (Lee 2004) for indoor trajectory 
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reconstruction. As shown in Fig. 5, MAT adds nodes at every turning point in the building, while spatial graph 
skips those nodes that are not fork, because the trajectory of pedestrian will not have multiple possibilities when 
passing through this kind of nodes. What these two methods have in common is that the edges in the both graphs 
represent sections of the corridor. Spatial graph includes several kinds of nodes: “entrance/exit” nodes (in green), 
“dead end” nodes (in dark green), “room” node (in orange), “fork” nodes (in red) and “camera” nodes (in blue). 
Dividing the nodes into different categories can depict the indoor environment more accurately and provide more 
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information for GNN. 


Algorithm 1: DETECT the room in which each person enters 


Input: A person’s coordinate P 


A set of 2D vectors represent m doors D = { A; B1, A2Bo,..., AmBm} 


1 rid+ 0 
2 dist + œ 
3 for i + 1 to m do 


aoa oe 


Ci + the projection point of P onto A;B; 
r + (AP - A,B) / |AvB,I? 
if r <0 then 

| de |A;P| 

if r > 1 then 

| d+ |BiP| 

if 0<r<1then 

| dt |C:P| 

if d < dist then 

rid + j 

L. dist + d 


Mria < the midpoint of A,iaByia 
if |PMprial > kx |AriaBria| then 
| r0 
else 
| r+ rid 


20 return r 


Output: The room in which the person enters r € [0, m] 


Fig. 4: Pseudo code to detect the room in which each person enters 


(a) MAT (b) Spatial Graph 


Fig. 5: Spatial graph compared with medial MAT (Lee, 2004) 
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2.2.2 Pedestrian Simulation 


To achieve more information for indoor trajectory reconstruction, pedestrian simulation is conducted to analyze 
the behavior and the time required for a person to reach a specific position in the considered environment. We 
adopt agent-based modeling (ABM) for pedestrian simulation. ABM has been widely applied to simulate the real- 
world operation, particularly pedestrian movement, traffic network and manufacturing chain. The three main 
elements in ABM are agents, their attributes as well as behavioral rules (Cheng and Gan, 2013). Each agent is an 
autonomous component having its attributes defined by user-input parameters, such as its size and moving speed. 
Every agent also behaves according to a set of decision rules, for example, approaching multiple places in a 
specified sequence. With these characteristics, every agent in an environment persistently interacts with each other, 
in pursuit of specific objectives. 


There have been extensive case studies that were able to simulate human behaviors in various scenario. Said et al. 
(2012) modeled how occupants in a high-rise building react in case of emergency and find the quickest evacuation 
route. On the other hand, Liu et al. (2014) demonstrated the functionality of ABM in simulating typical pedestrian 
flow phenomenon, such as bidirectional flow in corridors and through bottlenecks. Their simulation result 
consisted of the flow rate, movement velocity and spatial density. Suggested by Seyfried et al. (2005), pedestrian 
flow is notably consistent with the fundamental traffic theory. This inspired us to analyze the speed-density-flow 
relationship for pedestrian movement, by adopting traffic flow theories. 


One typical ABM engine is AnyLogic, which possesses excellent user-friendliness by allowing graphical drag- 
drop control and advanced Java codes. Moreover, it supports realistic visualization by either component markup 
or importing geometric models from external application. In this paper, the BIM model is imported to AnyLogic 
using DWG file as intermedia. Then the walls of the building can be generated automatically. By setting the source 
and target of pedestrian flow, the movement of pedestrian in the considered indoor environment can be simulated. 
Besides, we also added “line service” at the positions of fire doors (which are normally closed) to consider the 
time delay of passing the doors. The pedestrian queuing behavior and parameter setting are investigated by Kim. 
et al. (2013), which laid the foundation for establishing a robust framework for our project. 


2.2.3. GNN-based Trajectory Reconstruction 


With the pedestrian simulation based on ABM, the spatial graph can be further enriched with the time consumption 
information from one position to the other within the indoor environment. Each edge in the graph has two features: 
“time consumption” and “pass_or_not”. The former is a feature to indicate the time required for a person to reach 
a specific position from the other. The time is taken as the average value detected from the simulation. The latter 
is a binary-class feature showing whether a person has passed a specific path, which is represented by an edge. 
The movement direction mentioned in Section 2.1.2 is used to estimate the path that the person is most likely to 
travel through, which is achieved by matching the detected direction with the directions of different branched of 
corridors. Besides, each node will have 5 features: x-coordinate, y-coordinate, category, timestamp when a person 
is detected, and speed of detected person. “Category” refers to the categories of node according to Section 2.2.1. 
Speed of detected pedestrian relies on the camera registration and CV techniques to estimate the speed. For those 
nodes that no pedestrian pass, the value of this feature is set as 0. Base on this graph representation, the indoor 
trajectory reconstruction can be formulated as an edge classification task on graph, as shown in Fig. 6, aiming to 
divide all edges into those that pedestrians pass by and those that do not. 
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Fig. 6: Edge classification for indoor trajectory reconstruction 
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2.3 Multi-target Multi-Camera Tracking and Re-Identification 


Given a query video, our method targets on detecting, tracking and identifying all people within this video, which 
is composed of three parts, a detecting module, a tracking module, and a ReID module. The whole framework is 
shown in Fig. 7. 
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Fig. 7: The framework for multi-target multi-camera tracking and re-identification 
2.3.1 Detection 


For each frame in the query video, we first apply a Faster-RCNN (Ren et al. 2015) to detect all persons within this 
frame. In detail, given a frame, a CNN is applied to extract pixel-wise features, which are then sent to a region 
proposal network (RPN) to generate region-of-interest (RoI) which may contain a person. These regions are fed 
into another CNN classifier to determine whether they correspond to a person or not. Finally, a non-max- 
suppression (NMS) method is applied for redundancy removal. NMS first outputs the highest scoring box and then 
suppresses all overlapping boxes with that box, repeating this process until all boxes are processed. 


2.3.2 Tracking 


The tracking module works for aligning objects in later frames with those in previous frames. We use a GNN 
called MPNTrack (Braso and Leal-Taixe, 2020) to achieve this goal. To be specific, for positive predictions in each 
frame, we first apply a RoIAlign operation to extract their features. We then construct a graph. Features of positive 
predictions in each frame are treated as nodes, and all prediction pairs across frames form the edges within this 
graph. For each edge in this graph, we encode its feature as the deviation between the features of its two end nodes. 
These edge features are then passed through a multi-layer perceptron (MLP) for classification. If two end nodes 
of an edge correspond to the same person within two frames, we label it as one, otherwise zero. In this way, we 
associate predictions across frames, which predictions correspond to the same instance and which are not. 
Therefore, we derive appeared instances in the query video. 


2.3.3  Re-ID 


Finally, we deploy a Re-ID module to identify these instances. We first forge an instance gallery. Given the training 
videos, the detection and tracking module was first used to obtain different instances. We then randomly select n 
(n=10 in our experiments) instances for each person and store them in the instance gallery. We extract feature 
vector for each instance from the well-trained Re-ID model and use these feature vectors as the high-level semantic 
representations for persons. During testing, for different instances obtained from the query video, we treat them as 
probe, extract their feature vectors and retrieve their identifications stored in the instance gallery. Specifically, we 
compute the cosine distance between the queried feature vectors and all the stored feature vectors, and treat the 
person with the least feature distance as the output identification. The detailed structure is shown in Fig. 8. 
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Fig. 8: The detailed structure of the Re-ID module (Luo et al. 2019) 


3. ILLUSTRATIVE EXAMPLE 


We selected a part of the HKUST campus with more than 200 rooms to demonstrate the method proposed in this 
paper. Firstly, we developed a Dynamo program to derive the spatial graph based on the floor plan from the BIM 
model. Fig. 9 shows the process of graph construction. Then DWG file is exported to AnyLogic to support agent- 
based pedestrian simulation (as shown in Fig. 10) to enrich the spatial graph. Currently, the simulation model is 
established automatically based on the DWG file, though some manual adjustment is needed. The pedestrian flow 
logic is established manually, which could be automated with some further developed. We extracted the average 
time for pedestrian to reach a position from the other as the “time consumption” feature for the corresponding 
edge. Besides, by selecting different combinations of starting and end points, the movement of pedestrian in the 
building following designated paths can be simulated, the results can be further used to train the GNN. 


(c) Graph construction (d) Spatial graph 


Fig. 9: Graph construction based on BIM model 


The CCTV system is integrated with the BIM model through camera registration, so that the information captured 
by cameras can be linked to the specific location in the BIM model as well as the spatial graph. In our experiment, 
we conduct camera registration manually for 6 cameras using characteristic points and Equation (1) and (2). With 
the ReID techniques, the same person appears in different cameras’ POV can be identified. The ReID model is 
pre-trained on a public benchmark -- DukeMTMC-reID (Ristani et al., 2016), and got an accuracy of 100% for the 
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100 pedestrians we observed in the building. The high accuracy may result from the stable lighting condition and 
fewer pedestrian appearing in the FOV of each camera, compared to the outdoor environment. 


Fig. 11 shows the feature maps and RelID process, while Fig. 12 provides an example of ReID results. 
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(b) 3D view of the pedestrian simulation 
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(c) Pedestrian flow logic 


Fig. 10: Pedestrian simulation based on AnyLogic 
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Fig. 11: Feature maps and Re-ID process 
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(a) Scene 1 (b) Scene 2 


Fig. 12: Examples of the Re-ID results for the two cameras’ videos 


Information including ReID results as well as the direction and speed of the detected person were embedded into 
the graph and processed by GNN to make edge classification. The indoor trajectory can be reconstructed by 
classifying edges into 2 categories: those the person passed, and those did not. The GNN classification achieved 
81.2% for the trajectories for the observed 100 pedestrians. We found that for some cases where pedestrians 
stopped during the process, GNN would produce some wrong classifications, since staying will affect the total 
time for a person to reach a specific location. 


4. CONCLUSION AND DISCUSSION 


This paper proposed an indoor trajectory reconstruction method integrating BIM and GNN. A spatial graph is 
proposed based on BIM to depict the connection of indoor spaces and integrate information captured by CCTV 
system. The CCTV system is related to the BIM model through camera registration. ABM-based pedestrian 
simulation is leveraged to simulate the movement of persons within the building, which provides more information 
to the spatial graph. Trajectory reconstruction is implemented using GNN, which works on spatial graph to 
aggregate information and classify edges. This study provides an automated approach to trace the pedestrian in the 
building, which could provide building managers with more insight in the indoor movement pattern and crowd 
distribution, and thereby could support a lot of smart applications such as indoor navigation, ambient-assisted 
facility management, precise product delivery, etc. 


The proposed approach still has several limitations. For environments with a large number of rooms, the CCTV 
system usually cannot cover all the entrances of rooms due to limited number of cameras. In this scenario, for 
those rooms whose doors are not in the FOVs of cameras, we could not achieve the time of staying in the room 
for a detected person only using cameras, hence it may affect the GNN’s performance on edge classification. Other 
techniques such as Internet of things could be explored to provide supplementary information for those positions 
that are not covered by cameras. Besides, optimization of camera layout can also be investigated to enlarge the 
coverage of camera and reduce blind areas. In addition, an automated camera registration method can also be 
included to improve the convenience of applying the proposed method. 


REFERENCES 


Asadi, K., Ramshankar, H., Noghabaei, M., & Han, K. (2019). Real-time image localization and registration with 
BIM using perspective alignment for indoor monitoring of construction. Journal of Computing in civil Engineering, 
33(5), 04019031. 


Braso, G., & Leal-Taixé, L. (2020). Learning a neural solver for multiple object tracking. In Proceedings of the 
IEEE/CVF conference on computer vision and pattern recognition (pp. 6247-6257). 


Braun, A., Tuttas, S., Borrmann, A., & Stilla, U. (2020). Improving progress monitoring by fusing point clouds, 
semantic data and computer vision. Automation in Construction, 116, 103210. 


Cheng, D., Gong, Y., Zhou, S., Wang, J., & Zheng, N. (2016). Person re-identification by multi-channel parts- 
based CNN with improved triplet loss function. In Proceedings of the iEEE conference on computer vision and 
pattern recognition (pp. 1335-1344). 


904 


Cheng, J. C., & Gan, J. (2013). Integrating agent-based human behavior simulation with building information 
modeling for building design. International Journal of Engineering and Technology, 5(4), 473. 


Cheng, J. C., Kwok, H. H., Li, A. T., Tong, J. C., & Lau, A. K. (2022). BIM-supported sensor placement 
optimization based on genetic algorithm for multi-zone thermal comfort and IAQ monitoring. Building and 
Environment, 216, 108997. 


Cheng, J. C., Poon, K. H., & Wong, P. K. Y. (2022). Long-Time gap crowd prediction with a Two-Stage optimized 
spatiotemporal Hybrid-GCGRU. Advanced Engineering Informatics, 54, 101727. 


Deng, H., Hong, H., Luo, D., Deng, Y., & Su, C. (2020). Automatic indoor construction process monitoring for 
tiles based on BIM and computer vision. Journal of construction engineering and management, 146(1), 04019095. 


Kim, I., Galiza, R., & Ferreira, L. (2013). Modeling pedestrian queuing using micro-simulation. Transportation 
Research Part A: Policy and Practice, 49, 232-240. 


Lee, J. (2004). A spatial access-oriented implementation of a 3-D GIS topological data model for urban entities. 
GeolInformatica, 8, 237-264. 


Liu, S., Lo, S., Ma, J., & Wang, W. (2014). An agent-based microscopic pedestrian flow simulation model for 
pedestrian traffic problems. IEEE Transactions on Intelligent Transportation Systems, 15(3), 992-1001. 


Lukins, T. C., & Trucco, E. (2007, September). Towards Automated Visual Assessment of Progress in Construction 
Projects. In BMVC (pp. 1-10). 


Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). Bag of tricks and a strong baseline for deep person re- 
identification. In the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 0-0). 


Nauata, N., Chang, K. H., Cheng, C. Y., Mori, G., & Furukawa, Y. (2020, Fall). House-gan: Relational generative 
adversarial networks for graph-constrained house layout generation. In Computer Vision—ECCV 2020: 16th 
European Conference (Part I 16, pp. 162-177). 


Patron-Perez, A., Lovegrove, S., & Sibley, G. (2015). A spline-based trajectory representation for sensor fusion 
and rolling shutter cameras. International Journal of Computer Vision, 113(3), 208-219. 


Rebolj, D., Babič, N. Č., Magdič, A., Podbreznik, P., & Pšunder, M. (2008). Automated construction activity 
monitoring system. Advanced engineering informatics, 22(4), 493-503. 


Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region 
proposal networks. Advances in neural information processing systems, 28. 


Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi- 
target, multicamera tracking. European Conference on Computer Vision Workshops (EECVW), Amsterdam, The 
Netherlands (pp. 43-51). 


Said, H., Kandil, A., & Cai, H. (2012). Agent-based simulation of labour emergency evacuation in high-rise 
building construction sites. In Construction Research Congress 2012: Construction Challenges in a Flat World 
(pp. 1104-1113). 


Seyfried, A., Steffen, B., Klingsch, W., & Boltes, M. (2005). The fundamental diagram of pedestrian movement 
revisited. Journal of Statistical Mechanics: Theory and Experiment, 2005(10), P10002. 


Song, C., Chen, Z., Wang, K., Luo, H., & Cheng, J. C. (2022). BIM-supported scan and flight planning for fully 
autonomous LiDAR-carrying UAVs. Automation in Construction, 142, 104533. 


Traunmueller, M. W., Johnson, N., Malik, A., & Kontokosta, C. E. (2018). Digital footprints: Using WiFi probe 
and locational data to analyze human mobility trajectories in cities. Computers, Environment and Urban Systems, 
72, 4-12. 


Troncoso-Pastoriza, F., Lopez-Gomez, J., & Febrero-Garrido, L. (2018). Generalized vision-based detection, 
identification and pose estimation of lamps for BIM integration. Sensors, 18(7), 2364. 


Wong, P. K. Y., Luo, H., Wang, M., & Cheng, J. C. (2022). Enriched and discriminative convolutional neural 


905 


CONVR 2023. PROC 


ONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


network features for pedestrian re-identification and trajectory modeling. Computer-Aided Civil and Infrastructure 
Engineering, 37(5), 573-592. 


Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A comprehensive survey on graph neural 
networks. IEEE transactions on neural networks and learning systems, 32(1), 4-24. 


Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification: A 
benchmark. Jn Proceedings of the IEEE international conference on computer vision (pp. 1116-1124). 


Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., & Sun, M. (2020). Graph neural networks: A review of 
methods and applications. AI open, 1, 57-81. 


906 


IMAGE SEGMENTATION APPLIED TO URBAN SURFACE AND 
AERIAL CONSTRAINTS ANALYSIS 


Marco Lorenzo Trani & Federica Madaschi 
Politecnico di Milano, Milan, Italy 


ABSTRACT: The rapid progress of artificial intelligence (AI) has prompted the exploration of its potential 
applications in the construction industry, although at a slower rate. Since the starting point of a design is the 
analysis of the site 5 constraints, the purpose of the ongoing research is the application of artificial intelligence in 
risk assessment for site areas. The primary objective of this research project is to develop an interactive map that 
employs AI to identify potential surface and aerial interferences. This map aims to support planners, engineers, 
and architects during the site context analysis phase by providing real-time visualization of obstacles. The 
interactive map allows users to explore and analyze identified obstacles, enabling cluster markers and filtering of 
features. The results obtained from applying this approach in Milan, Italy, demonstrate its functionality and 
usability, highlighting the tool's ability to provide valuable information in both localized and citywide scenarios. 
Potential improvements such as size assessment and advanced marker generation are also being examined to 
enhance the management of surface and air interferences. The goal is to enhance the tool's functionality, accuracy, 
and planning efficiency in construction projects. 


KEYWORDS: Image Segmentation, Risk Assessment, Construction Site, Clustering Techniques. 


1. INTRODUCTION 


During the project execution phase, a multiplicity number of agents and situations may affect the organization and 
the functioning of a construction site in terms of time and costs of the project. Indeed, these elements of 'disorder' 
can cause activities outside the program with negative effects on the quality, work, and safety of workers. The 
presence of the yard can be also a potential operational problem as continuity, health, and safety must be ensured 
both within the yard and outside. Thus, both perspectives must therefore be studied bidirectionally, at the interface 
between the site and environment. In addition, potential problems that may arise in terms of the size and duration 
of the project must be carefully analyzed for the construction of the network and mobile infrastructure and mobile 
construction site. 


The operational criticalities represent the construction process variables, not necessarily known a priori, which 
may cause difficulties or inability to perform planned works. The analysis of potential criticalities in construction 
projects helps to identify operational issues and anticipate additional costs or time needed to avoid surprises during 
the work. For this reason, developing a specific design analysis is fundamental to arriving at the execution phase 
with an informed attitude toward possible problem solving. Operational criticalities can be organized into five 
criticality classes related to different areas, adopted as: 


© Surrounding situation: analysis of the several characteristics related to the construction site and its surrounding. 

© Production: analysis of the relationship and the organization between functional-spatial design elements, 
technological-productive design elements, and the utilization of human/techniques/materials/resources which 
are crucial for efficient and cost-effective execution. 

@ Specific design elements: analysis of programming aspects of a project which may be left incomplete for a 
conscious choice caused by specific difficulties in obtaining useful data to improve the design. 

© Health and safety of the site: evaluation of how preventive and protective equipment, organizational measures, 
and training can affect the time and cost. 

© Contingencies: analysis of those situations outside the construction site which may occur without generating 
surprise. 


The risk assessment is a crucial step in evaluating and comparing design options, as any identified issues can be 
addressed during both the design and execution phases. It is important to investigate criticalities beforehand to 
address and solve them during the execution phase. This helps to quantify any increased costs and construction 
times. Writing an operational criticality report can support the validation of the design and contribute to client and 
designer awareness of any unresolved criticalities. It is important to update the document regularly throughout 
each design phase. This ensures that any critical issues that are identified during the first phase are addressed, and 
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any new critical issues that arise in subsequent phases can be resolved with an improved level of detail. The 
question then arises of how to exploit existing models of artificial intelligence related to the analysis of images to 
create a support tool aimed at drafting the document on the identification and analysis of critical issues and its 
constant updating. Indeed, it may be applied during the different project phases: for example, when the site 
inspection has not yet been carried out, the analysis of images from Google Street View (GSV) of the area can 
help the designer to have a clearer idea of the context in which it will operate. Instead, if the analysis is carried out 
on the photographic survey of the site, also carried out at different times of the duration of the yard, it can help to 
keep the surrounding criticalities monitored. Some critical issues that may arise during the development of the 
work can be detected with greater precision and detail. Hence, this paper focuses on the operating criticalities 
relative to the surrounding situation of a yard. The project presented aims to create an interactive map through 
simple and easy-access implements. It wants to demonstrate how using an artificial intelligence model for image 
segmentation applied to input images from GSV, is possible to create an interactive map that accurately provides 
the position of possible criticalities. A tool of this type, although of simple structure, can be very useful to designers, 
architects, or engineers during an inspection. It can become a support from which to draw a list of elements to be 
evaluated once they arrive at the site, because obviously, it is not possible to avoid this activity. 


2. LITERATURE REVIEW 


The development of an interactive tool supported by artificial intelligence, that provides a constantly updated view 
of the critical issues due to the context can be a basis for the implementation of further improvements to better 
support designers and engineers. The dynamic nature of the map must be able to allow the continuous updating of 
the input data, ensuring a greater precision than the static maps provided by the geographic information systems 
(GIS) (D. Farkas et al., 2016) which may contain inaccuracies or outdated information, regarding the positioning 
of services and sub-services. 


The contextualization of the intervention plays a crucial role in construction projects. It involves a thorough study 
of the site, its surroundings, and the internal factors that directly impact the time, cost, and feasibility of individual 
operations. In today's construction industry, where companies often face significant pressure to meet strict time 
and budget targets, insufficient evaluation and consideration of surrounding constraints can result in an unsafe and 
accident-prone workplace (E. Rahnemay et al., 2017). Addressing these challenges, the integration of artificial 
intelligence with Building Information Modeling (BIM) has gained prominence in recent years (Y. Pan et al., 2022). 
This integration offers the potential to handle the vast amounts of complex and uncertain data present in 
construction projects more reliably and efficiently. 


2.1 The Dense Prediction Transformers Model 


Fully convolutional networks are the prototypical architecture for dense prediction (Long et al., n.d.; Sermanet et 
al., 2013). Dense prediction, a foundational challenge in computer vision, entails leveraging input images to 
generate intricate output structures such as semantic segmentation, depth estimation, and object detection through 
learning (Liu, 2021). 


Convolutions are linear operators with a restricted receptive field, which requires sequential stacking in deep 
architectures to attain a comprehensive context and substantial representational capacity due to the limited 
receptive field and expressivity of individual convolutions. The image segmentation model used for the 
development of the tool in this paper was developed by Ranftl et al., who introduced Dense Prediction Transformer 
(DPT). DPT is an architecture for dense prediction tasks that adopts an encoder-decoder design, where the encoder 
utilizes a transformer as its fundamental computational building block. Notably, the authors employed the vision 
transformer (ViT) proposed by Dosovitskiy et al.. 


Thus, this model introduces a distinct architecture that replaces the conventional convolutional neural network. 
The main advantage of the vision transformer lies in its ability to generate a consistent and high-resolution global 
receptive field at each stage. Unlike the traditional convolutional approach, which examines individual windows 
gradually, transformers possess a unique mathematical architecture that establishes relationships between each 
neuron or zone in an image and each other. As a result, transformers show a relational nature, considering the entire 
image simultaneously in each position. This attribute facilitates the generation of predictions that are more refined 
and globally consistent than fully convolutional networks. 
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3. SURROUNDING SITUATIONS 


The surrounding situations class is crucial as thoroughly examines the variety of factors that are closely related to 
the construction site and its surroundings. These factors can have a significant impact on work time and cost, 
influence the construction site’s layout, and how materials are stored, handled, and manufactured ( Marco Lorenzo 
Trani, 2012). 


Speaking of yard contextualization, are numerous categories of operating criticalities to analyze and report in the 
analysis of criticalities document. Indeed, within it, problems deriving from site location in an urban fabric, 
hydrogeological characteristics of the site, subsurface constraints and due to the sub-services, aerial and surface 
constraints, also analysis of the environmental impact of the yard and the interference it may have with other 
nearby activities, are reported. Therefore, based on the categories of objects that the DPT model can recognize, 
seven elements have been identified on which to base the realization of the model (buildings, trees, plants, 
signboards, streetlights, skyscrapers, and poles), representative of certain categories just mentioned, going to focus 
the attention on localization in the territorial context, surface features, surface features, aerial restriction, and 
interferences with other activities. 


The first mentioned controls the general access conditions to the construction site which may represent a potential 
operation criticality. For example, where the primary road is restricted, uneven, or overcrowded, it may be 
imperative to carry out extra measures to enhance the current road infrastructure or build new infrastructure to 
satisfy requirements. In case of temporary unavailability of the usual routes, the absence of alternative routes can 
further think hard about the planning of certain supplies to reduce the risk of a failed delivery or of lack of 
construction site-free spaces. The presence of road constraints for example represents a potential operational 
criticality in relation to the need to acquire dispensations or permissions from the public body. If the usual routes 
are temporarily unavailable and there is no alternative, it’s important to carefully consider the planning of supplies 
to minimize the risk of failed deliveries or lack of space at a construction site. Road constraints, such as the need 
for dispensations or permissions from public bodies, can also create operational challenges. In the project 
developed this translates into the realization of the street network for the area under examination based on 
driveways, to provide an analysis of the critical issues concerning the main roads. 


The technological-architectural, urban, and naturalistic preexistences represent a source of potential criticality, for 
example, the presence of a cantilever roof that, because of its height, doesn’t allow the site access. As part of the 
project, the surface features were considered in the analyzed area by identifying the nearby buildings. Evaluation 
of neighboring buildings, in addition to influencing the height development of the yard, may also determine the 
choice of specific workings to avoid damage to elements not belonging to the site. 


The presence of plants, trees, or poles in the area can pose a significant challenge to the safe and efficient operation 
of construction activities. This includes both aerial and mechanized handling, as well as the installation of 
temporary structures like scaffolds. Therefore, the aerial restriction analysis is important to ensure that these 
obstructions do not hinder the proper functioning of the construction site and that cranes can rotate freely at night 
without interfering with nearby buildings. 


Lastly, the construction site's proximity to other productive activities can be a potential operational issue since the 
continuity, healthiness, and safety of both subjects must be ensured. If the site involves public services, the 
protection of service users is also considered in the critical analysis. Both perspectives need to be investigated in 
the interface between the construction site and the environment. The healthiness or hazardous elements generated 
from the environment to the site must be evaluated concerning the anthropic use of the environmental system. 
Additionally, potential issues that may arise in terms of the project's size and duration must be carefully analyzed 
for the construction of network infrastructure and mobile yards. 


4. THE PROJECT 


The purpose of the presented project is to identify potential surface and aerial interferences that may affect the 
area where a construction or civil yard is expected to open. To achieve this, an interactive map was created using 
the Python library "folium" to pinpoint the exact location of these obstacles. Python code was used to generate a 
street network for an interactive map using Google's OpenStreetMap. Users could choose to create the network 
for an entire city or a portion by entering coordinates or neighborhood names. The Lambrate and Citta Studi 
neighborhoods in Milan were analyzed for this paper project. By incorporating the street network into the code, 
geographic coordinates were established that allowed downloading images from Google Street View. To ensure 
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that ample data was gathered for identifying constraints, the network points were settled to be downloaded at a 
consistent distance of 25 meters. 


In Figure 1, the road network generated is shown. Figure 2, with the orange points, shows the 2625 points identified 
for the analysis: each of them is characterized by an ID which is associated with the longitude and latitude of the 
position where it is located. After which, the corresponding images through GSV were downloaded. It’s important 
to note that there isn't a direct correspondence between points and images, as GSV provides multiple frames from 
different angles. This feature worked to benefit the project, as it allowed to detect possible interferences with 
greater accuracy. 
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Fig. 1: Milan neighborhoods street network Fig. 2: 2625 street network points arrangement 


Once the images were downloaded the DPT Model was utilized to analyze them. Rather than using object detection, 
image segmentation was chosen in the tool for obstacle identification due to its approach. It processes of classifying 
each pixel of the image into a class or label that identifies the occupied area of the object that can be recognized 
by the algorithm. This model proved to be highly effective in recognizing a wide range of possible obstacles, even 
in urban environments with numerous overlapping elements, limited image quality, or small obstacle sizes. In 
Figure three, how the process works it shown. The first row displays the input data in the form of images from 
GSV. Meanwhile, the second row showcases the model's image segmentation results. By utilizing pixel 
classification based on the labels integrated into the model, the reconverted objects like machines, poles, and 
bicycles can be identified in the images. 
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Fig. 3: Example of images segmented by the model 


The model was trained to recognize over a hundred objects and use a filtering command to store outcomes for the 
most relevant constraint elements, including buildings, trees, plants, signboards, streetlights, skyscrapers, and 
poles a new CSV file with results was created. 


After conducting Image Segmentation, the focus shifted toward visualizing the identified constraints. Upon 
counting the objects, it was observed that there were over 10,000 potential constraints that were recognized. The 
table below categorizes and counts these constraints. 
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Table 1: Number of elements recognized from 10500 images analyzed by Image Segmentation Model 


Element Building. Tree Plant Signboard Streetlight Skyscraper Pole Total 


Total 1882 1868 1563 1749 1655 4 1499 10220 


Using the Folium library, constraint indicators have been configured with precise geographical coordinates from 
the road network generated previously: the code iterates each element of the CSV file containing the data related 
to the project. For each obstacle a marker has been generated that has been assigned a color based on a color 
dictionary for quick and easy identification. Included is the option to select markers, which displays a popup with 
specific information such as object name, image segmentation result, latitude and longitude coordinates, and 
reference image. Additionally, is possible to have multiple markers linked to the same image input. This is due to 
the fact that the image segmentation was done on the same input, resulting in different outcomes for the two objects. 
The reason for this is that during analysis, the model searches for all the elements it was trained for, as shown in 
Figure 3. Therefore, having the same references within the popup is not an indication of an error. 


Let's examine the output tools in detail, focusing on their key features and assessing the advantages and 
disadvantages of the chosen representatives. The main difference between them lies in the type of marker used. 
The first tool employs CircleMarkers to indicate obstacles, while the second uses ClusterMarker. 


To identify the position of obstacles in the studied area, the first tool created uses CircleMarkers for each element. 
These markers are color-coded based on the results obtained, allowing for a visual representation of the data. All 
the obstacles identified in Table 1 are displayed on the final map, thanks to the code implementation. Figure 4 
depicts how the output appears to a user who has zoomed in on a specific area. 
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Fig. 4: CircleMarkers output map, zoom in is applied 


To facilitate the reading of the position of markers, two solutions have been adopted. The first is general, inserted 
inside the code as a constraint for the positioning of the various indicators according to the reference category. In 
fact, an offset in the generation conditions within the map was made so that there was not a total overlap that 
prevented the display. Instead, the second solution concerns the possibility of managing the layers on which the 
markers have been inserted. Figure 4 shows that points with the same coordinates are slightly separated from each 
other. However, the exact location can be determined by selecting the marker of interest for the popup display. 
Different size radii were used to distinguish between the various markers, with their size gradually decreasing. To 
display or hide a marker, simply select it from the drop-down menu located on the left side of the map. 


The second map proposed shows surface and air constraints via ClusterMarker. These markers are sued to clearly 
showcase clusters of data that are focused on a specific point while also indicating the number of elements in a 
micro zone. This is achieved by displaying circular markers that contain a numerical value within the area it covers. 
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By adjusting the map zoom, the markers can be enlarged or reduced. The groupings are distinguished by their 
color: green indicates a few markers, while red indicates a large number. In Figure 5 a visual representation of the 
second type of output is reported. 
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Fig. 5: ClusterMarker output map 


This second output exhibits the same characteristics as the CircleMarker version but with distinct functions and 
interpretations. For instance, the filter option is always incorporated to allow for data selection, but instead of 
concealing layers, it directly hides the obstacle category. Moreover, the legend, located at the bottom right, no 
longer changes color scale based on outcomes, but rather on the classification of constraints, assigning a specific 
color to each. Figure 6 illustrates a zoom-in on the map to see more clearly the markers. However, when multiple 
groups appear at the same location, it can be challenging to get in an immediate overview of constraint positions. 
It is necessary to click on the specific interest group to view it. Moreover, there may be overlaps between groups 
associated with different categories and various ClusterMarkers, which can make the results less clear and 
immediate. Figure 6 demonstrates also how a popup appears when a marker is selected. 
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Fig. 6: ClusterMarker output map, zoom in and popup are applied 
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5. TOOLS APPLICATION AND VALIDATION 


An application case was used to perform a thorough analysis of the tools' operation. In this case, the focus was on 
analyzing the criticality of the intervention that needs to be carried out at the junction of via Carlo Pascal and via 
Celeste Clericetti. Figure 6 depicts the map of the area before results from the analysis were included. The blue 
rectangle represents the intervention's position, while the red circle marks the specific area to be analyzed. 


+ 


Fig. 7: Identification of the location of the assumed construction site and the area to be analyzed 


To identify critical points, Figures 8 and 9 were analyzed, which report the output results obtained using the 
developed tools. 


Fig. 8: Operational criticalities from tool Output 1 


In Figure 8, there is a depiction of the constraints located near the intersection that has been identified as the area 
of analysis. The main issues that may arise are related to the presence of trees and buildings, which could 
potentially cause problems during future air handling procedures. Despite this, the arrangement of markers appears 
to be tidy and easily discernible, although this could be due to the high zoom used on the area and the existence of 
only two primary categories of elements. 
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In contrast, Figure 9 displays the outcomes obtained by utilizing tool number 2. Upon comparing these results with 
the previous image, it becomes apparent that reading the results, in this case, is neither rapid nor straightforward. 
The presence of numerous ClusterMarkers in one location hinders readability as one must open them to view 
specific locations. Additionally, this grouping only applies to Markers that belong to the same category. As shown 
in the figure, when multiple elements that belong to different categories overlap, their groupings also overlap. This 
can make it difficult to read the results when using the map at a higher zoom level than what is shown in the figure. 
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Fig. 9: Operational criticalities from tool output2 


Although with the respective problems, identifying critical points in the area was relatively easy. Table 2 clearly 
shows the number of obstacles that could potentially cause issues during the operational phases of the future 
construction site. It is evident that the main obstacles are the buildings and trees adjacent to the area being analyzed, 
as previously mentioned. 


Table 2: Number of elements recognized in the analyzed area 


Element Building. Tree Plant Signboard Streetlight Skyscraper Pole Total 


Total 23 32 6 0 3 0 8 69 


Therefore, the proposed project has achieved its objective of reliably identifying objects and their positioning on 
the map with good results while acknowledging operational limits. However, it is important to note that the clear 
display of obstacle placement seen previously may not always be guaranteed. The first tool requires the application 
of a filter to provide a clear view of individual obstacles, although improvements have been planned for the overall 
view. On the other hand, the instrument with ClusterMarkers may have poor readability due to the overlap of 
different object groupings at the same point. 


Typically, to assess potential risks in a particular environment, images are analyzed to identify any potential 
obstacles. Hence, it's crucial to study how the model segmented the images and locate these obstacles on output 
maps. The proposed code provides visual support for this analysis. Two of the images extracted by GSV for the 
analysis of Via Carlo Pascal (Figure 10) are shown below. Normally the designer would identify and manually 
report the possible obstacles, such as the presence of trees for air handling, or car parking in case the occupation 
of public land was necessary. Analyzing images through the image segmentation model, this procedure becomes 
assisted and facilitated. Comparing the original with the results obtained and shown in Figure 10, it is possible to 
see how elements such as cars, trees, sidewalks, and poles are recognized and marked distinctly. The classification 
of image pixels according to the elements for which the model has been trained can therefore be of fundamental 
help where the overlapping of objects makes it difficult to recognize them. 
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Fig. 10: Image segmentation results for the street analyzed in the example 


However, by exploiting this type of approach for the analysis of criticalities, two fundamental limits can be 
identified, one technical and one "operational". The technical limit concerns the elements for which the model 
used has been trained. Indeed, there are about 150 objects that it can recognize, and not all of them are useful 
results for the purpose of this research. The solution to this limitation could be solved by developing a model 
trained to recognize a list of specific objects, related to the sector. Clearly, the realization is not immediate because 
in-depth knowledge of this type is beyond our competence. 


The "operational" limit consists in not being able to rely entirely on the instrument. The validity of the results in 
terms of object recognition and positioning according to geographical coordinates has been obtained and 
demonstrated with excellent results. However, it should be remembered that an analysis carried out by this tool 
does not completely replace an analysis conducted directly by the designer. As stated above, the tool wants to be 
a support to ensure greater accuracy in the assessment of criticalities, but it is good to remember the existence of 
a margin of imprecision that only human experience can fill. 


6. POSSIBLE IMPLEMENTATIONS AND CONCLUSIONS 


The improvement and further development of the tool could lead to higher output quality and the implementation 
of additional functionalities. One of these functionalities could involve classifying objects based on their height 
and identifying them within the map as either aerial or surface constraints. 


By adopting this classification approach and changing the output type, it would be possible to develop a tool that 
neglects the classification of objects according to their category of belonging. The recognized elements would still 
be positioned on the map based on their geographic coordinates; however, the color scale representing them would 
be based on the evaluation of their heights. Assuming to give the possibility to the user enters the reference value 
beyond which a constraint is considered aerial, the chromatic scale could then be defined based on this input that 
would represent the central value. 


Nevertheless, regardless of the approach adopted for the development of such a tool, the utilization of artificial 
intelligence models, like the one employed in this study, has proven to be a proactive way to identify potential 
difficulties related to construction site organization and planning operations. However, it is essential to recognize 
that the images used as input, sourced from Google Street View (GSV), may have limitations in terms of quality 
and detail. Therefore, it is plausible to consider that the effectiveness and accuracy of the tool could be optimized 
by employing photographic surveys executed with appropriate instrumentation. This integration would enable the 
tool to provide more detailed and precise results concerning the construction site context, facilitating better analysis 


915 


and identification of critical areas. In addition to the possibility of employing specific and higher-quality 
photographic surveys, considering the development of a dedicated AI model for this purpose could further enhance 
the capability to identify and address challenges inherent in construction site organization, offering a tailored 
solution for this domain. 


In conclusion, adopting these measures would provide designers with a more sophisticated and efficient tool to 
navigate the complexities of construction sites, reducing the risk of errors or unforeseen complications, and 
enhancing overall decision-making in construction planning and management. 
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ABSTRACT: This paper suggests the potential application of generative artificial intelligence-based image 
generation technology in the field of architecture, for early phase shape planning, using the styles of renowned 
architects. The study employed the following approaches: 1) Intensive image generation based on the styles of 20 
architects to test the Al's recognition ability and image quality. 2) Additional training was conducted for architects 
with low recognition rates to construct an enhanced learning model in the quality of image generation. 3) In 
addition to generating architectural visualization images using existing architects' design styles, alternative styles 
were proposed through design combinations, aiming to concretize ambiguous idea communication in the early 
stages of design and enhance its efficiency. The study sheds light on the future prospects of applying this generative 
AI model in the field of architecture. 


KEYWORDS: Design Style of Architects, Generative AI, Image Generation, Fine-tuning 


1. INTRODUCTION 


In the field of architecture, visualization plays a crucial role in comprehending and evaluating complex design 
alternatives and spatial qualities [Greenberg, 1974]. Especially in the early design stages, it allows clear expression 
of design ideas and spatial concepts, enabling the identification and resolution of potential issues and facilitating 
effective communication among stakeholders [Akin, 1978]. Ultimately, early-stage visualization defines the design 
direction, enhances collaboration, efficiency, and leads to better outcomes. However, creating high-quality 
visualization images, particularly during the abstract design phases, remains challenging. While advancements in 
3D modeling and rendering have improved the realism of visualizations, the process still demands time and 
specialized skills [Fonseca, 2017]. Currently, the emergence of AI and machine learning-based image generation 
models offers the ability to create images from text in a short timeframe. Applying this technology in the field of 
architecture has the potential to expedite the design process and foster creative design solutions. 


Building upon this, our research focuses on the feasibility of generating architectural visualizations using AI-based 
image generation method. In Chapter 3, we tested the performance of the image generation AI model based on 
architects’ styles, and in Chapter 4, we conducted additional training based on the test results. Finally, Chapter 5 
demonstrates the practical applications of the Image generation AI including trained model. 


2. BACKGROUNDS 
2.1 Architectural visualization generation methods 


Architectural visualization has evolved significantly over the years, transitioning from traditional manual 
techniques to embrace the power of digital technology. Historically, architects relied on hand-drawn sketches, 
physical models, and paintings to communicate their design ideas [Kehir Al-Kodmany, 2001; Atilola et al., 2016]. 
These methods, though expressive, had limitations in terms of scale, precision, and the time-intensive nature of 
creation. As architecture moved into the digital era, Computer-Aided Design (CAD) emerged as a game-changer, 
enabling architects to produce accurate and editable digital representations of their designs [Chiu, 1995]. It marked 
the beginning of a transformative shift in architectural visualization, offering architects the ability to iterate rapidly, 
explore design alternatives, and create highly detailed virtual models. 


As technology continued to advance, architectural visualization expanded its horizons to encompass photorealistic 
rendering, three-dimensional (3D) modeling, and immersive experiences [Koutamanis, 2000]. Sophisticated 
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rendering software, bolstered by powerful Graphics Processing Units (GPUs), enabled architects to create high- 
fidelity visualizations that realistically conveyed materiality, lighting, and texture. 3D modeling provided a 
comprehensive understanding of spatial relationships [Eastman, 1999], offering architects the ability to manipulate 
and analyze their designs in a virtual environment [David et al., 2022]. This progress in technology not only 
increased the efficiency of the design process, ultimately leading to better-informed design decisions and more 
visually impactful presentations. 


2.2 Image generation artificial intelligence (AD 


In 2014, Generative Adversarial Networks (GANs) emerged as a dominant paradigm for image generation research. 
GANs showcase their prowess by creating realistic images through competitive training involving a generator and 
a discriminator [Goodfellow et al., 2014]. As the stability of GAN training methods improved, the focus shifted 
towards generating images with specific attributes and refining the generated outputs [Karras et al., 2020]. These 
techniques have been applied to comprehend the information conveyed in architectural drawings, making it 
interpretable for computers. [Kim et al., 2019; Kim et al., 2020] 


Since 2020, within the diverse landscape of image generation AI platforms, several notable options have emerged. 
Midjourney [Oppenlaender, 2022] specializes in style blending, empowering users to influence the fusion of 
multiple styles within the generated images. DALL-E 2 [Ramesh et al., 2022] creates images from textual 
descriptions, showcasing the potential to transform words into visuals, despite occasional inconsistencies. In 
contrast, Stable Diffusion [Rombach et al., 2022] leverages a diffusion model, ensuring stability during training 
and providing the capacity to manage image quality and intricacy. It shows immense promise in bridging the gap 
between abstract architectural concepts and their visual manifestation. 


Among these, Stable Diffusion holds particular promise for architectural visualization research, given its ability to 
handle complex image transformations, align well with architectural subtleties, provide stability during training, 
and offer control over output quality and detail [Oppenlaender et al., 2023; Borji, 2023]. This positions Stable 
Diffusion as a potent tool to bridge the gap between architectural concepts and visual representation, redefining 
how architects approach their work and streamlining the creative process. 


2.3 Potential for architectural visualization automation 


There has been extensive research in image generation AI; however, its full potential for architectural visualization 
has yet to be realized. This research introduces a novel approach to architectural visualization using image 
generation AI models, emphasizing their transformative impact on this field. By harnessing advanced machine 
learning techniques, the study explores innovative methods to enhance architectural visualization, including text- 
to-image generation, which creates images from textual descriptions [Saharia et al., 2022]. This capability enables 
the generation of highly realistic images, making it a versatile tool with significant potential for various 
architectural visualization applications. 


Architects’ Styles 


Design Idea Generative AI 2D Visualization 


Input Processing Output 


Fig. 1: Research approach: Image generation AI based architectural visualization 


3. INTENSIVE TEST OF IMAGE GENERATION AI WITH ARCHITECTS’ STYLE 
3.1 Image generation test for architects’ styles 


Image generation artificial intelligence (AI), particularly Stable Diffusion (SD), involves two primary methods. 
The first method generates images from a text prompt, known as text-to-image generation. The second method, 
image-to-image generation, requires a seed image in addition to text prompts to generate images based on both 
inputs. In this paper, we focus primarily on text-to-image generation which generates images (Img) using the 
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"generate()" function, requiring a AI model (M), parameters (Param), and prompts ( P,). 


generate(M, Param, P,) = Imgg a) 
Param = {resolution, sampling method, sampling steps, CFG scale} (2) 
P, = {SDP, RQP} (3) 


The Param consist of four components: resolution, determining the image size in pixel; sampling method, 
selecting method for image extracting from latent space; sampling steps, defining the number of extraction 
stages; and Classifier-free guidance scale (CFG scale), specifying the influence level of the prompt. The P, 
consist of Scene Description Prompts (SDP), describing the target scene, visual composition, and graphic style, 
and Resolution Quality Prompts (RQP), adjusting the image's quality. Additionally, to prevent errors, each prompt 
composition includes negative prompts to specify what should be excluded. Table 1 provides example prompts 
corresponding to its composition. 


Table 1: Prompt composition and its examples. 


Composition of P, Positive Prompt example Negative prompt example 


MSA A residential house, professional photograph, g ee a . 
Scene description ait: i . . Commercial buildings, painting, sketch, bird’s-eye 
photorealistic rendering, deep depth of field, high- a , . 
prompts (SDP) cya ie 3 . view, isometric, portrait, cropped view, etc. 
key lighting, two-point perspective, etc. 


. . realistic shadows, enhance-detail, v ray rendering, low quality, too much noise, normal quality, 
Resolution quality ` . . i . : . 
full HD, masterpiece, highly detailed, high quality, watermark, blurry textured, blurry, noise, faint, text, 
prompt (RQP) 8k. et ; 
, etc. etc. 


In this section, we tested the performance of the text-to-image method defined earlier for generating architectural 
visualization. We randomly selected 20 architects who have received architectural awards or have had significant 
international influence, and generated images reflecting their styles. While additional descriptive keywords could 
enhance image quality by further delineating each architect's features, we excluded them for a clearer assessment 
of the default model's architect’s style recognition capabilities. Instead, we used only the prompt "Architect's name- 
inspired residential house" and prompts associated with photorealistic rendering, commonly used in architectural 
visualization. We generated approximately 100 to 150 images for each architect in a local PC environment, with a 
resolution of 1024 by 512 pixels. The generated results are summarized in Figure 2. 


None style Low Recognition High Recognition 
“A residential house” “Renzo Piano-inspired residential house” “Frank Gehry-inspired residential house” 


Fig. 2: Result of text-to-image generation test 
3.2 Findings and ongoing inquiry in image generation AI 


The generated results were assessed based on three criteria for their alignment with P,. This assessment 
encompassed: (1) Style fidelity, which measures the accuracy of depicting the design characteristics of architects, 
(2) Domain fidelity, which verifies the representation of unique features for residential houses, and (3) Image 
quality, assessing the extent to which the photorealistic style rendering prompt was reflected in terms of graphic 
style, composition, and resolution. 


The image generation test results indicated that the current SD model achieved a high level of domain fidelity and 
overall image quality. However, it exhibited low recognition for specific architects’ styles, regardless of their 
prominence, resulting in lower quality and less detailed images of generic Western-style residential houses without 
any corresponding style features. As a result, the need for further additional training of the existing image 
generation model to address these limitations in recognizing certain architects’ styles became evident. Motivated 
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by this necessity, we conducted additional training, specifically targeting Architects' design styles, as depicted in 
Figure 3. 


Dataset Trained model 
Additional Training 
Architects’ Styles 
Input Processing Output 


Fig. 3: Research Overview: Additional training for architects’ styles 


4. ADDITIONAL TRAINING FOR ARCHITECTS’ STYLES 
4.1 Additional training and data preparation 


If the majority of generated images (Img,) do not match the target image group (/mg,), it is required to replace 
the current model (M) with an alternative model (M’'). This replacement can involve either substituting the model 
or enhancing it through further training. In this chapter, Low Rank-Adaptation (LoRA) approach [Hu et al., 2021] 
is employed for additional training, aiming to improve the recognition of specific architects’ styles and to generate 
images that appropriately belong to the Img+. The target model (M,) is developed using the "train()" operator, 
based on the base model (M), hyperparameters (Hyperparam) and a target training dataset (D+). 


Most of Imgg ¢ Img, > M' > M (4) 
train(M, Hyperparam, D,) = M, E€ M' (5) 
Hyperparameters play a significant role in both the model's learning process and the subsequent performance of 
the M,. We specifically focused on three crucial hyperparameters: the training batch size (BS,), the number of 
epochs (epoch), and the learning rate (æ). At the same time, the effectiveness of additional training relies on a 

high-quality dataset (D,) containing image data (Imgp) along with corresponding annotation text files (Txtp). 
Hyperparam = {BS,, epoch, a} (6) 
D; = {IMgp1,Txtpy, »-IMGpn, TXtpn} (7) 
The additional training process, as depicted in Figure 4, involves two essential steps: (1) dataset preparation 
[Abdallah et al, 2017] and (2) training [Hu et al., 2021]. During the dataset preparation phase, meticulous training 
data collection is required to ensure alignment with P,. Preprocessing phase aids in removing unnecessary content 
that might disrupt the training process. It is crucial to ensure content quality of training data, and the 


correspondence between Imgp and Txtp. Following this, the Txtp is paired with the respective Imgp, and the 
prepared D, is then trained using the specified Hyperparam. 


Collection Preprocessing Pairing Embedding 


(1) Dataset preparation (2) Training 


Fig. 4: Additional training process 
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4.2 Additional training of existing model with architects’ styles 


In this chapter, we provided additional training to architects who received low or no recognition in the image 
generation test discussed in Chapter 3. We conducted a few-shot learning using the previously defined training 
approach. By incorporating the trained LoORA model (M+) into the image generation function, the possibility that 
generated images (Imgg') closely resemble the designated Img; is notably improved compared to the previous 
results. When utilizing M+, in addition to M, it is crucial to input the application weight (W), a value ranging 
from 0 to 1, where 0 represents 0% and 1 represents 100%. 


generate(M'V M(M,,W), Param, P,) = Img, (8) 


We compared the performance of the default model (M) with the trained model (M;) by generating images with 
both. The image generation process followed equation (8), and the parameters (Param) and prompts (P_t) used for 
image generation remained consistent with those used in Chapter 3. As shown in Figure 5, the existing model had 
very low recognition rates for certain architects, so even with a full weight, the specific features of those styles 
were not represented. However, when using the trained model, these features are correctly displayed, and their 
application is proportional to the weight. The additional training allows us a wider range of style options that the 
original model could not achieve. 


Default model 1.0 Trained model 0.5 Trained model 1.0 
“SANAA- inspired residential house” “SANAA- inspired residential house” “SANAA- inspired residential house” 


Fig. 5: Additional training results: SANAA style 


5. DEMONSTRATIONS 


Our investigation revealed that Al-driven image generation rapidly produces high-quality architectural 
visualizations from text prompts, empowering architects to easily create reference images and visualizations from 
the start of the design process. This chapter demonstrates the practicality of Image Generation AI, particularly 
Stable Diffusion, across various architectural styles. The three applications include: (1) building additional training 
models for desired architects’ styles, (2) generating architectural visualizations applying an individual architect's 
style, and (3) generating style alternatives by combining more than two styles. 


5.1 Implementation of different styles through additional training 


In this scenario, we employ image generation AI to incorporate diverse architects’ styles, providing users with 
desired visualization outcomes through additional training. In this chapter, we conducted additional training 
following the process outlined in Figure 4, targeting five architects with very low recognition rates, aiming to 
enhance the model's level of detail. To ensure high-quality training images, we sourced project photographs from 
reputable sources, such as Architects’ official websites, focusing on full facades in 1-point or 2-point perspective. 
Preprocessing involved image resizing and the removal of excessive information. Text data was constructed for 
each image, extracting from interviews with architects, expert analyses, and prior research about their styles. Each 
target style was trained with 15-25 datasets in average, with hyperparameters {1, 100, 0.0001}, and it took 8-15 
minutes per each training. 


The resulting model files, incorporated into the existing model, produce architectural exterior images closely 
mirroring architects’ design styles, even when data is limited. In this chapter, we generated five M, files, each 
representing the styles of different architects, capable of producing high-quality images comparable to those shown 
in table 2 of chapter 5.2. 


5.2 Visualization of design alternatives from text prompts 


This scenario describes how we acquired a diverse set of creative reference images representing different architects’ 
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design styles. In this chapter, we applied the M, developed in the previous chapter to M in order to generate 
architectural visualizations based on the styles of 20 selected architects, using the same prompts as those used in 
the image generation test in chapter 3.1. We generated approximately 100 to 150 images for each architect based 
on equations (1) and (8), with the parameters {(1024, 512), Euler a, 20, 7}. These images, as demonstrated in 
Table 2, accurately reflect not only their respective styles but also maintain the essential characteristics of 
residential buildings, even for architects with little prior experience in residential projects. These generated outputs 
provide a rich source of diverse and concrete ideas and inspirations right from the initial stages of architectural 
design, streamlining communication and facilitating the design process. 


Table 2: Resume of generated visualizations from text prompts 


Input prompt 


Output 


Descriptive Keywords 


LM. Pei-inspired 
residential house, 
Photorealistic 
rendering prompt set 


Modernist, minimalist, 
geometric, cultural fusion, 
monumental, symmetrical, glass 
and steel, iconic, etc. 


Renzo Piano-inspired 
residential house, 
Photorealistic 
rendering prompt set 


Lightness, Transparency, 
industrial materials, fluidity, 
civic and public focus, open 


spaces, etc. 


Le Corbusier-inspired 
residential house, 
Photorealistic 
rendering prompt set 


Modernism, functionalism, free 
fagade, open floor plans, 
concrete, horizontal windows, 


etc. 


SANAA-inspired 
residential house, 
Photorealistic 
rendering prompt set 


Minimalist, subtle elegance, 
organic forms, conceptual 
simplicity, fine steel structure, 


white color, transparency, etc. 


Shigeru Ban-inspired 
residential house, 
Photorealistic 
rendering prompt set 


Sustainability, paper architecture, 
wooden modular structure, 
organic design, grid, organic 
forms, patterns, etc. 


Frank Lloyd Wright- 
inspired residential 

house, Photorealistic 
rendering prompt set 


Antoni Gaudi- 
inspired residential 
house, Photorealistic 
rendering prompt set 


Organic architecture, prairie 
style, horizontal lines, flat roofs, 
clerestory windows, cantilevered 
overhangs, etc. 


Curved lines, mosaic and tilework, 
nature-inspired design, whimsical 
details, unconventional forms, use 


of color, etc. 


Mies van der Roe- 
inspired residential 
house, Photorealistic 
rendering prompt set 


Minimalism, steel and glass, open 
floor plans, linear and geometric 
design, Bauhaus influence, 
international style, etc. 


Ex) Photorealistic rendering prompt set = Positive prompts: professional photograph, photorealistic rendering, realistic, enhance-detail, v ray 
rendering, full HD, masterpiece, highly detailed, high quality, 8k, two-point perspective, exterior view, full shot, deep depth of field, £/22, 
35mm, high-key lighting, natural lighting, realistic shadows; Negative prompts: low quality, bad proportion, awkward shadows, unrealistic 
lighting, pixelated textures, too much noise, unrealistic reflections, normal quality, watermark, bad perspective, confusing details, blurry 
textured, blurry, noise, cloudy, faint, text. 
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5.3 Combination between architects’ styles 


This scenario illustrates the creation of diverse image references by blending multiple architectural styles, resulting 
in novel and previously unseen styles. Users can expand their architectural image references using image 
generation AI by combining the styles of two or more architects. The P, and Param for these operations are the 
same as those in other image generation cases, except for the SDP (Scene Description Prompts), which is 
observable in Table 3. This setup allows for a comparison between the results of applying a single style and the 
application of multiple styles, facilitating an assessment of the progress of the operations. 


Table 3: Example of combination of architects’ styles using text-to-image method 


Classification Mono-style: SANAA style Mono-style: Luis Barragan style Multi-style: SANAA and Barragan 
Model Trained model 
Parameters Resolution: 1024 X 512 / Sampling method: Euler a / Sampling steps: 20 / CFG scale: 7 
tapat SANAA-inspired residential Luis Barragan-inspired residential SANAA and Luis Barragan-inspired 
Prompts house, Photorealistic rendering house, Photorealistic rendering residential house, Photorealistic 
prompt set prompt set rendering prompt set 


Output u mu 
Descriptive Minimalist, elegance, sensitivity, Minimalism, color, geometry, Fine structures, colorful, rectilinear, 
Keywords tine steel structure, white color, concrete, simplicity, play of light concrete, simplicity, geometry, etc. 
simplicity, transparency, etc. and shadow, etc. 


As shown in Table 3, the combination of two different styles is evident and noticeable. When the curvilinear style 
of SANAA is combined with the rectilinear style of Luis Barragan, the curvilinear aspect of SANAA becomes less 
pronounced. Additionally, the resulting style incorporates the color palette and materiality of Luis Barragan, along 
with SANAA's distinctive design feature of thin structures. These findings demonstrate that image generation AI 
can create new alternative styles based on existing ones, potentially generating a variety of additional alternatives. 


6. CONCLUSION 


This research marks the initial steps in exploring the potential of architectural visualization through image 
generation AI, with a specific focus on the Stable Diffusion model. The study underscores the significant impact 
of image generation AI, particularly in the field of architecture and its application in early-stage architectural 
visualization. Leveraging deep learning and image generation techniques, we trained the model to capture the 
distinctive styles of renowned architects, using this knowledge to visualize typical residential houses. Our testing 
revealed that while the default SD model generally produces high-quality architectural visualizations with domain 
fidelity, it does face limitations in recognizing the unique styles of architects. However, we demonstrated that these 
limitations can be improved through additional training, highlighting the powerful potential of image generation 
Al. 


This approach plays a pivotal role in bridging the gap between abstract design concepts and tangible visual 
representations, empowering architects to effectively convey their creative ideas. Integrating AI technology into 
architectural visualization broadens creative possibilities, enabling architects to explore a diverse range of design 
alternatives. Looking ahead, further research is essential to develop comprehensive and refined methods for 
additional training, expanding beyond architects' styles to other targets. Additionally, the focus should be on 
enhancing the accessibility and utility of this technology by exploring other generation methods, such as image- 
to-image, and the development of user-friendly tools. 
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ABSTRACT: This paper describes an approach utilizing Generative AI to support diverse design alternatives for 
building facades based on the local identity. Extensive research is currently being conducted for exploring the 
applications of LLM-based generative AI models to diverse kinds of visualizations. By applying generative AI to 
facade design, the study aims to develop additional training models that generate alternative design options 
reflecting local identity, facilitating the acquisition of remodel design images from multiple texts and images. 
Building facades in cities and regions are essential for people's aesthetic perception and understanding of the 
local environment, enabling the recognition and differentiation of specific areas from others. Therefore, 
implementation method of the additional training model based on generative AI in this study, reflecting this, can 
be summarized as follows: 1) collection and pre-processing of image data using Street View, 2) pairing text data 
with image data, 3) conducting additional training and testing with various inputs, 4) proposing relevant 
application methods. This approach can be expected to enable efficient communication of design at an early stage 
of the architectural design process beyond traditional 3D modeling and rendering tools. 


KEYWORDS: Building facade, Generative AI, Local identity, Design alternative, Additional Training 
Model 


1. INTRODUCTION 


Recently, platforms such as 'Midjourney,' 'Dreamstudio AI,' and 'Stable Diffusion’ have been developed and used 
alongside Large Language Model (LLM) based platforms like ‘ChatGPT’ (OpenAI, 2022) to generate images 
using Diffusion models. These platforms are provided in accessible forms for the public, and their interfaces and 
functionalities are consistently updated. These platforms are based on generative artificial intelligence, allowing 
users to easily create desired images creatively by providing prompts and adjusting settings. This generative AI- 
based image creation approach is not only applied in design and art fields but also in various other domains. It is 
also being employed in architecture, generating images of diverse buildings and spatial designs in various styles, 
contributing to applied research. 


In this study, the aim is to apply the image generation capability of generative artificial intelligence to obtain facade 
images of buildings. Furthermore, this involves creating building images with regional design identities, aiming 
to establish an approach for more efficient utilization during the initial building planning and design stages (Relph, 
1976). This approach focuses on commercial buildings, allowing for the swift acquisition of creatively designed 
facade images in the early architectural phases by adjusting the degree of regional identity incorporation. 


The research follows the following methodology: Initially, to evaluate the effectiveness of the image generation 
model, a repetitive process of image generation was conducted, resulting in the creation of a substantial number 
of images for testing. Based on these results, it was evident that additional training of the basic generative AI model 
was necessary. Subsequent steps for this additional training were carried out as follows: 1) Constructing a training 
dataset, 2) Conducting additional training and generating model files, 3) Confirming and utilizing result images 
incorporating the additional training model files. This was executed in the form of additional training utilizing the 
Diffusion-based model. The additional training was built upon LoRA (LoRA: Low-Rank Adaptation of Large 
Language Models), and by adjusting hyperparameters, it was ensured that high-accuracy images were generated. 
Following this, the generated additional training model files were applied to generate and confirm result images, 
suggesting an approach to visualize these images in the early architectural stages. 


2. BACKGROUND 
2.1 Image Generative AI 


Since 2020, diffusion process-based techniques have gained prominence in the arena of deep learning-driven image 
synthesis. These approaches iteratively update pixel values to progressively generate images (Ho, Jain, & Abbeel, 
2020). Concurrently, scholars have immersed themselves in artificial intelligence models that facilitate the 
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transformation of textual data into visual representations, marking significant progress in the domain of image 
generation (Ramesh, Dhariwal, Nichol, Cuy, & Chen, 2022; Saharia, Chan, Sawena, Li, Whang, Denton, ... & 
Norouzi, 2022; Rombach, Blattmann, Lorenz, Esser, & Ommer, 2022). 


While considerable scholarly inquiry has been devoted to deep learning-assisted image synthesis, its potential in 
the realm of architectural design visualization remains largely untapped (Kim, & Lee, 2020). This investigation 
introduces an innovative proposition for architectural design visualization, harnessing the capabilities of Al-driven 
image synthesis models and recognizing their transformative impact in the landscape of image generation. Through 
the application of these advanced machine learning techniques, this section aims to explore novel pathways to 
enhance architectural design visualization via Al-powered image training models. 


With the advancement of the LLM model and the image synthesis technology, the feasibility of producing 
architectural visualization images based on provided textual input has become achievable. Termed as text-to-image 
synthesis, this process possesses the ability to generate highly realistic images, making it a versatile instrument for 
generating a diverse range of architectural visualization content. As AI technology continues its evolution, the role 
of text-to-image synthesis is expected to play a crucial role in the architectural domain. Consequently, the 
integration of Al-driven image synthesis enhances the potential for imaginative exploration beyond traditional 
methodologies. 


2.2 New opportunities for Architectural Visualization 


Architectural visualization, such as photorealistic images, plays a crucial role in enhancing communication within 
the field of architecture (Lee, Lee, Kim, & Kim, 2023). Firstly, photorealistic renderings transcend mere geometric 
massing, enabling architects to vividly convey their design intentions to clients. These images serve as 
intermediaries between architectural drawings and experiential aspects of architectural spaces by presenting 
architectural concepts in a reality-like manner (Kim, & Lee, 2022). Such visualizations facilitate shared 
understanding among stakeholders. Secondly, visualization empowers not only architectural professionals but also 
stakeholders, clients, and the public to grasp architectural visions that transcend architectural terminology and 
technical complexity. Visualized images like photorealistic renders enable individuals to comprehend the 
interaction between planned architectural attributes, ambiance, and the surrounding environment, enabling 
informed decision-making based on information. Transitioning from geometric massing to photorealistic render 
images allows for a more universal and comprehensive communication of intricate architectural concepts, thus 
promoting smoother communication. 


In summary, integrating visualization images like photorealistic renderings into the architectural design process 
enables efficient communication in the early stages of architecture, induces information-based decision-making, 
and enhances creative design. While traditional architectural visualization relied on complex technical processes 
and necessitated GPUs and specialized hardware, leveraging generative AI, as discussed earlier, allows for 
obtaining numerous detailed visualization images effectively without the need for separate GPU renderers. 
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Fig. 1: Overview of the approach proposed in this study. 
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The following section examines the application of such generative artificial intelligence to architecture, exploring 
the potential of generating architectural images. This investigation, as outlined in the introduction, focuses on the 
design aspect of building facades within the realm of architectural elements (Kier, 1984). Specifically, this inquiry 
aims to determine the feasibility of effectively generating architectural visualization images by emphasizing 
regional identity as a pivotal design consideration within building facade design. 


3. TEST ON BASIC IMAGE GENERATION MODELS 
3.1 Test Generative AI Platforms 


Various platforms are being developed using generative artificial intelligence to make it easily accessible for the 
public. These platforms utilize different interfaces and base models, resulting in a range of image generation 
platforms that cater to various user requirements such as freedom of generation, design style of images, sizes, and 
image quality. In this paper, we utilized the commonly used platforms 'Midjourney,'’ 'Dreamstudio AI, and 
‘Playground AI to understand their respective interfaces, directly engage with them, and explore their features and 
specific functionalities. 


Among these three platforms, the latter two platforms, excluding 'Midjourney,' offer partial free usage for image 
generation, with subscriptions or purchases required for more extensive usage. Each interface provides common 
features including the option to select various image styles like 'Enhance,' 'Anime,' 'Photographic,' 'Comic book,' 
as well as the ability to create Positive and Negative prompts. All platforms also offer the functionality to adjust 
specific settings to generate images. Additionally, they provide an "Image-to-Image" feature wherein users can 
input desired images to generate text based on the images, resulting in the creation of different images. By utilizing 
these functionalities, one can quickly generate images tailored to specific requirements. For instance, when aiming 
to acquire building facade images as shown in Table 1, it becomes possible to generate images that incorporate 
more creative ideas. The following section will proceed with an examination of building facade image generation 
through detailed testing, utilizing prompts that encompass greater specificity and domain knowledge. 


Table.1: Investigation of the interfaces of prominent platforms for image generation models and examples of 
generated images (The generated images from Midjourney and Dreamstudio AI are provided by openart 
(https://openart.ai/), while the examples generated by Playground AI are based on similar prompt-based 
approaches). 


Dreamstudio AI Playground AI 


Midjourney 


— 
| ede 


Web Interface 


INPUT Key Prompt Building Facade Image 


OUTPUT Generated 


Images 


3.2 Testing of Facade Image Generation Reflecting Local Design Identity 


In this section, we aim to investigate whether it is possible to generate facade design images that reflect regional 
identity using generative artificial intelligence. To achieve this, we conducted image generation tests based on text 
prompts using the existing basic model grounded in Diffusion. The tests were divided into three main categories: 
facade images of buildings without region-specific text input, facade images of buildings reflecting Korean style, 
and facade design images of commercial buildings in Manhattan. The goal was to compare the generated images 
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for these three categories. For each category, we utilized key prompts such as "Building Facade," "Building Façade 
reflects Korean style," and "Building Façade reflects Manhattan style." Additionally, we employed prompts to 
enhance image quality to generate results like those in Table. 2. 


By utilizing the existing generative artificial intelligence-based model, it was observed that when region-related 
text prompts were input, corresponding images could generally be generated. However, this primarily resulted in 
localized images, and it was found that the generated facade design images did not exhibit diverse variations 
reflecting the unique images associated with each region. For instance, in the case of Korean facade images, 
predominantly images of buildings featuring traditional Eastern style hanok architecture were generated. Therefore, 
in the subsequent section, we proceed to construct a model through fine-tuning of the existing generative artificial 
intelligence model, aiming to determine if image generation with a focus on regional facade design identity can be 
achieved. 


Table. 2: Example of generating building facade images with regional names using the basic generative AI model 


No. Key Prompts Generated Images 


alii 


1 Building Facade 


Building Façade reflects 


Korean style 


Building Façade reflects 
Manhattan style 


4. CONSTRUCTION AND UTILIZATION APPROACHES OF THE ADDITIONAL 
TRAINING MODEL 


4.1 Additional Training and Testing of Local Facade Design Identity Model 


In this section, we aim to investigate the generation of facade design images that reflect regional identity by 
conducting additional training of a generative artificial intelligence model within the scope of the target region. 
Model construction utilized the Diffusion-based model implemented on the foundation of LLM (Large Language 
Model) for additional training. This additional training process can be summarized into three main stages: 1) Data 
Preparation, 2) Model Training, and 3) Image Testing and implementation. Data preparation involved pairing 
image and text data. For efficiency in image data collection, street-view functionality from portal sites API was 
employed, as described earlier. However, the distorted nature of 360-degree panorama images from street-view 
led to generating indistinct façade images, lowering image quality and accuracy. To address this, image 
preprocessing was conducted to correct distortions, resize images to a consistent size, and then pair them with text 
data to compile the dataset. 


For model training, the LoRA (Low-Rank Adaptation of Large Language Models) approach was adopted to 
facilitate additional training of the Diffusion model (Hu, Shen, ...& Chen, 2021). LoRA allows for rapid additional 
training of existing large-scale models within a short timeframe, without significant demands on GPU performance. 
Unlike other methods, LoRA generates relatively smaller additional training model files and offers the advantage 
of easily assessing style incorporation through adaptability changes in the model files. Thus, in this research, LORA 
is employed to construct additional training models, optimizing hyperparameters to generate highly accurate 
images with minimal distortion. The optimization of hyperparameters, including adjustments to epochs, training 
batch size, and caption extensions, aims to enhance the accuracy and quality of the resulting images. 
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Fig. 2: Construction Process of the Additional Training Model 


When conducting additional training using LoRA, model files with the extension ".safetensors" are generated. 
Inserting these generated model files into the model management folder of the Stable Diffusion Web-UI enables 
the models to function in the format of a text prompt, allowing the generation of desired images alongside the text 
data used for training. Furthermore, by adjusting the adaptability of the generated model files, a wide array of 
creative design images can be produced. Applying the additional training model file created using exterior images 
and text data of commercial buildings in the Seoul area, according to different weight values, results in images as 
shown in Table 3. When applying a weight of 0.1, images of buildings with views from different angles beyond 
the front facade are generated. As the weight approaches 1.0, images distinctly reflecting Seoul's facade design 
style are generated. 


Table. 3: Test of Additional Training Models according to each weight 


Weight Generated Images 


0.1 


0.5 
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1.0 


4.2 Utilization Approaches of the Additional Training Model File 


In this section, we demonstrate one example of an approach that can be applied in the early stages of architecture 
using the constructed additional-trained model files. We validated the images that could be generated by applying 
the model files using actual facade images of buildings in Seoul. When applying this method and providing detailed 
prompts, it was observed that images reflecting Seoul's facade design style could be generated. 


Table. 4: Image generation from Each Input Image 


A B C 
INPUT Key Prompt Building Façade reflects Seoul style 
Detailed Prompt Modern design style An arched window Red brick finish 
Utilized Model file Building Façade Design Style of Seoul.safetensors 
Images ; 
OUT- Generated Images 


PUT 


5. CONCLUSION 


In the initial design stages of existing buildings, facade design plans have traditionally relied on manual efforts by 
designers and architects, or methods involving 3D modeling tools and high-performance GPU renderers. These 
methods have necessitated repetitive tasks to facilitate communication with clients. This study discusses an 
approach that leverages the recent advancements in generative artificial intelligence, which is being actively 
applied in related fields, to generate facade design alternatives using image generation AI. Within the context of 
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this research, we propose an approach that enables quick confirmation of building facade design plans reflecting 
regional facade identity in the early design stages and the generation of numerous alternatives. 


According to the approach proposed in this study, it was confirmed that utilizing image generation AI can rapidly 
confirm building facade design plans, incorporating regional facade identity, and produce a multitude of 
alternatives. This approach was demonstrated through applying Seoul's facade design style using actual building 
images to showcase its effectiveness. Consequently, exceptional visualization images were generated. 


Although there may be limitations in this study, particularly in constructing a fine-tuned model focused on Seoul, 
it holds significance in its potential to create and explore more diverse and domain-specific models using this 
methodology. This opens the door for further application-oriented research, leveraging more specific 
characteristics and domain knowledge to refine the approach. 
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ABSTRACT: Early failure detection and abnormal data reconstruction in sensor data provided by building 
ventilation control systems are critical for public health. Early detection of abnormal data can help prevent failures 
in crucial components of ventilation systems, which can result in a variety of issues, from energy wastage to 
catastrophic outcomes. However, conventional fault detection models ignore valuable features of dynamic 
fluctuations in indoor air quality (IAQ) measurements and early warning signals of faulty sensor data. This study 
introduces a hybrid framework for early failure detection and abnormal data reconstruction applying variance 
analysis and variational autoencoders (VAE) coupled with the long short-term memory network (VAE-LSTM). The 
periodicity and stable fluctuation of IAQ data are exploited by variance analysis to detect unusual variations 
before failure occurs. The IAQ dataset which is corrupted by introducing complete failure, bias failure and 
precision degradation fault is then used to verify the feasibility of the VAE-LSTM model. The results of variance 
analysis reveal that unusual behavior of the data can be detected as early as 12 hours before failure occurs. The 
reconstruction performance of the developed method is shown to be superior to other methods under different 
abnormal data scenarios. 


KEYWORDS: Early failure detection, Abnormal data reconstruction, Variational autoencoder (VAE), Long short- 
term memory network (LSTM), Sustainable IAQ management 


1. INTRODUCTION 


Indoor air quality (IAQ) in public buildings is regarded as a hot study topic as it has a big impact on human health. 
Recent research has shown a connection between indoor air pollutants, including CO2, with health effects and 
academic performance (Szabados et al., 2022). According to the EPA's IAQ tools for schools (EPA, 2009), CO2 
concentrations in schools should adhere to the ASHRAE standard 62-2001 limit of 700 ppm over the outdoor 
concentration (just above 1000 ppm overall) for CO2 concentrations. Besides various laws and regulations, there 
is a need for continuous monitoring of IAQ, which includes the installation of sensors to detect anomalous events 
that may have a detrimental impact on the IAQ. Sensors are generally placed on walls or ceilings to collect hourly 
levels of pollutants, such as CO2, NO», and particulate matter (PM), which are small and aerodynamic. In addition, 
sensors can also collect relative humidity and temperature data. These monitoring sensors are important in the 
management of ventilation systems. Unfortunately, hardware sensors can encounter various issues, such as bias 
and precision degradation. In addition, they may experience data loss due to environmental or operability issues, 
which results in their measurements being unrealistic (Kim, Liu, Kim, & Yoo, 2014). When air quality is not 
monitored properly, it can lead to a decrease in IAQ levels. On the other hand, overestimation of the levels of 
pollutants can cause energy wastage. For these reasons, an effective method for early failure detection and 
reconstruction of faulty IAQ sensors can help increase the uptime of ventilation management. 


Some investigations have used statistical methods for abnormal data reconstruction (Kasam, Lee, & Paredis, 2014; 
Ouyang, Zha, & Qin, 2017). Although statistical methods are easier to implement and work well when there are 
few abnormal data, their performance is constrained as the data complexity increases. Additionally, the majority 
of statistical techniques rely on linear assumptions, which are incompatible with nonlinear real-world situations. 
Traditional machine learning approaches can use the whole data set to understand the patterns of failure 
performance in order to solve this problem. Unfortunately, they require a lot of manually classified anomalies to 
learn a predictor from given observations (Wang, Feng, & Liu, 2021), and due to the failure of unanticipated 
patterns of learning, such methods have poor performance (Bu et al., 2018). The emergence of neural methods 
without labelled information that is capable of handling non-linear data is a major factor that has led to the 
increasing number of applications of deep learning in process monitoring. 
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Time series prediction using deep learning methods, especially the long short-term memory neural network 
(LSTM), has achieved significant achievements in recent years (X. Li et al., 2017; Qing & Niu, 2018; Su & Kuo, 
2019). A hybrid convolutional neural network and long short-term memory model (CNN-LSTM) was used to 
impute missing values in time-series datasets for air-conditioning appliances (Hussain et al., 2022). The hybrid 
technique outperformed the CNN and LSTM variants in terms of performance. Ma et al. suggested a hybrid Bi- 
directional Imputation method using an LSTM model and Transfer Learning to fill the gaps in the energy 
consumption data (Ma et al., 2020). Transfer learning was utilized to prevent network saturation problems while 
the basic model was pre-trained on data from a comparable building. The performance demonstrated that the 
suggested architecture could successfully handle various scenarios with missing data, including continuous and 
random missing data. The developed strategy, however, was predicated on the prior assumption of source and 
target data collected on sufficiently similar buildings. In general, LSTMs are commonly used in a wide range of 
applications due to their ability to model non-linear dependencies. However, the prediction performance of LSTM 
can also be sensitive to the anomalies in the input due to its non-linear nature. Our proposed approach is to make 
sure that the input of the LSTM prediction network contains as little abnormal data as possible. Therefore, a method 
for abnormal detection and reconstruction in time series is necessary. However, existing abnormal detection 
methods require a lot of manually labelled abnormal observations. Some methods, which are based on 
unsupervised learning, are used for time-series abnormal detection to address these concerns by concentrating on 
normal patterns rather than anomalies (Breunig, Kriegel, Ng, & Sander, 2000; Cao, Nicolau, & McDermott, 2016; 
Erfani, Rajasegarar, Karunasekera, & Leckie, 2016). Unfortunately, due to the failure of unanticipated pattern 
learning, such discriminative modeling-based techniques still necessitate a significant number of normal 
observations and have low accuracy (Bu et al., 2018). 


Traditional fault detection methods based on supervised learning require sufficient training data (D. Li, Zhou, Hu, 
& Spanos, 2016; Zhao, Li, Zhang, & Zhang, 2019). However, the amount of data is usually insufficient in reality, 
because it is difficult to get high-quality training data sets for each type of failure. Yan et al. proposed a semi- 
supervised fault detection method, which only uses a small amount of data to detect the failure of the air- 
conditioning unit (Yan, Zhong, Ji, & Huang, 2018). However, it is limited only when the same failure occurs again. 


Recently, there has been a rise in deep generative modeling techniques that can be used for detecting anomalies. 
Autoencoder (AE) is a powerful deep learning technique that is appropriate for failure diagnosis with limited fault 
data since it can learn data features, avoiding the dependence on failure data (Zhang, Jiang, Zhan, & Yang, 2019). 
In addition, AE is a crucial tool of non-linear process monitoring as it can handle the encoding of input data and 
the extraction of features to provide meaningful representations of data in various applications, such as failure 
detection and data reconstruction. Variational autoencoder (VAE) technology has been shown to have benefits over 
conventional AE architecture. Both VAE and AE architectures can compress data from high-dimensional space to 
low-dimensional space (also known as latent space) and reconstruct complicated data. The main difference 
between VAE and regular AE architectures is that the former has a continuous latent space, allowing it to learn the 
distribution of data and reconstruct new information, which is crucial for process monitoring. However, as VAE is 
not a sequential model and cannot handle long-term dependencies in time series, it is possible to combine a 
sequential modeling approach such as LSTM models with VAE to solve this issue. Lin et al. proposed a hybrid 
VAE-LSTM model which can detect anomalies on multiple time scales (Lin et al., 2020). The VAE module forms 
local features on brief windows, while the LSTM module estimates the sequence's long-term correlation. However, 
if there is no abnormal data in the dataset, the hybrid VAE-LSTM model is not suitable as a means of prediction, 
as it will increase the computational complexity and cost. Thus, it is possible to make both the VAE-LSTM and 
the LSTM alone train independently and be exchanged if needed. 


To effectively address the issue of sensor faults, a comprehensive framework with early fault detection and 
reconstruction techniques is required. However, the combination of fault data reconstruction and early failure 
detection is rarely reported. Previous publications have demonstrated that early failure detection has a variety of 
applications, including analysis of climate pattern change (Drake & Griffen, 2010; Rogers et al., 2018), credit risk 
diagnosis (Ali & Dağtekin, 2008; Lu, Shen, & Wei, 2013), and early failure detection of key system components 
(Lee, House, Park, & Kelly, 1996; Yu, Woradechjumroen, & Yu, 2014). As introduced earlier, the increasing 
popularity of VAE in fault detection also makes it a new approach in early fault detection. The ball screw 
degradation assessment method used in (Wen & Gao, 2018) is similar to the one used in the manufacturing industry. 
The assessment shows that the deterioration of a ball screw can be evaluated using the Variational Autoencoder 
Reconstruction Error (VAERE). Malfunctions in an air handling unit (AHU) were studied in (Mesa-Jiménez, 
Stokes, Yang, & Livina, 2021), in which the VAERE was used to reproduce the sudden change of temperature 
before the fault occurred. This is because VAE can model the underlying probability distribution of the input, 
especially when processing a time sequence with a typical periodic pattern. In the case of failure, the periodic 
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characteristics of the time sequence will be destroyed. Therefore, the reconstruction error of the VAE can be used 
to observe the unusual behavior in the time series data. However, there are still issues that the existing literature 
does not address. Firstly, some studies adopt a reactive approach rather than a proactive approach. When a system 
fails, it often leads to service interruptions and necessitates engineers to temporarily shut down certain pieces of 
equipment in order to remedy the problem. Secondly, the probability of misjudgment of the failure diagnosis by a 
single indicator is high, and it is more scientifically correct to use multiple indicators for early warning. Therefore, 
it is necessary to develop an active multi-index method for early failure detection. 


To solve the problem of sensor faults including fault detection and reconstruction, a sustainable and real-time IAQ 
monitoring framework is proposed, which mainly focuses on early failure detection, failure data reconstruction, 
and assessment of the impact of failure data on ventilation performance. Early fault detection mainly utilizes the 
periodic characteristics of IAQ time series data and conducts variance analysis on reconstruction errors to monitor 
early warning failure signals. The reconstruction model combines the VAE architecture and Long Short-Term 
Memory neural network (LSTM). The purpose of integrating the two structures in this study is to extract data 
features according to the dynamic characteristics and nonlinear dependence of IAQ data, so as to reconstruct 
abnormal data. The contributions of this study are described in detail as follows: 


e A proactive early failure detection method is proposed for IAQ time series data. Taking advantage of the 
periodic and stationary fluctuation characteristics of [AQ data under normal operating conditions, the unstable 
behavior of the raw data before the failure is reproduced using variance analysis. The variance analysis is 
applied to the reconstruction error of VAE to check the fluctuation of IAQ data indicating where the failure 
has already occurred. Therefore, the engineers can find the potential failure and carry out maintenance, when 
necessary, before these failures actually happen. 


e When an anomaly is detected, the reconstructed data using VAE-LSTM replaces the abnormal data. The 
restored data is then fed into the LSTM neural network to forecast the time series. Thus, both the hybrid VAE- 
LSTM and the LSTM may be learnt independently and replaced as needed. The VAE-LSTM is developed by 
using the normal IAQ measurement data. Given that IAQ often exhibits changing patterns over time, the time 
variable, Hour, is translated into one-hot encoders as conditional information. For example, the IAQ in a 
restaurant typically present dramatic differences during meal hours and non-meal hours, and the time variable 
Hour can be used to provide additional conditional information. Therefore, Hour, which can be written by 
one-hot encoding vectors, is supplied as an input to both the encoder and decoder to provide additional 
controls over the process of data generation. 


e To verify the superiority of the proposed method over other neural approaches, different types of abnormal 
data are presented in the test dataset: the IAQ dataset is corrupted by introducing complete failure, bias failure 
and a precision degradation fault. 


The rest of the work is organized as follows: Section 2 provides the dataset used, the method description, steps in 
network training and explanations of the validation performance analysis. Section 3 compares the performance of 
the proposed method to other methods. Section 4 discusses the conclusions and limitations. 


2. MATERIAL AND METHODS 


In this section, a framework for early failure detection and fault data reconstruction is designed based on VAE, as 
illustrated in Fig.1. Firstly, the variance analysis is applied to find the abnormal fluctuations of IAQ data before 
the failure occurs. Once the abnormal signal is detected, the proposed VAE-LSTM hybrid model is applied to 
reconstruct the abnormal data. To verify the superiority of the proposed reconstruction model, various scenarios 
of abnormal data are introduced into the test dataset. The remainder of this section illustrates the detailed 
procedures. 
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Fig. 1: Framework of this study 


2.1 Data collection 


The Facilities and Management Office of the Hong Kong University of Science and Technology (HKUST) 
provided the data used in this study. The dataset recorded IAQ data of various types of campus buildings, including 
canteens, library buildings, lab buildings, etc. Among them, the canteen exhibits large IAQ oscillations caused by 
obvious variations of pedestrian flow. Furthermore, during the peak dining hours of the canteen, pedestrian flow 
increases significantly, and indoor pollutants like CO2 concentration can sometimes exceed the standard indoor 
concentration of 1000 ppm. This necessitates more precise ventilation management system control, and high- 
quality CO2 sensor data are required for achieving this control. Therefore, we chose the CO2 concentration of the 
canteen as an example to test the proposed methodology. The chosen time period included holidays, non-holidays, 
and the final exam period, which brings certain challenges for data analysis. In holidays, the behavior of occupants 
will be different from regular days, and the number of people during peak hours will be significantly reduced. 
These variations will affect the data patterns of indoor pollutants such as CO2. Typical temporal models are hard 
to adapt, resulting in error-prone predictions. Therefore, external features need to be added to provide additional 
clues to the temporal model to maintain high prediction accuracy when dealing with changes in holidays and 
examination periods. 


2.2 Early failure detection 


In order to analyze the early warning signals of the sensor data generated by the ventilation management system 
and give the engineering maintenance personnel sufficient time to repair the failure, a fault detection technique, 
i.e., variance analysis, is applied to the reconstruction error of the VAE-LSTM model. The early warning indicator 
is applied to the time series with failures through a selected sliding window. The choice of sliding window length 
is a compromise between the time resolution and the clarity of transitional signal changes. 


The variational autoencoder (VAE) is an algorithm for stochastic variational inference and learning using neural 
networks as the recognition model (Kingma & Welling, 2013). The reconstruction error of VAE can be calculated 
for abnormal detection. The idea underlying abnormal detection is that the VAE is not able to reconstruct 
unpredictable patterns or noise as well as it can regular data. Therefore, when x; in a given time series i is 
reconstructed by VAE, the error between the output £; and input of abnormal data is significantly larger. Variance 
is used to measure the degree of fluctuation of a set of data. Variance analysis is very straightforward to use and 
does not require specialized knowledge because it is a simple failure detection approach. The goal of this study is 
to integrate the reconstruction error based on VAE model with variance analysis to identify out-of-law abnormal 
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fluctuations in IAQ data in advance, described as: 


e; = Xj îi (1) 
2_ Ale -ey 
w=- ae | (2) 


where n is the number of observations in a sample, a? is the sample variance, e; is the reconstruction error for 
each input, ē is the mean value of all observations in the sample, and x; and %; are the actual and the 
reconstructed output, respectively. Therefore, the VAE indicator is derived from the reconstruction error, which is 
referred to as the variational autoencoder reconstruction error (VAERE). 


2.3 Model development 


For faulty data reconstruction and missing data imputation in ventilation control systems, a technique that can 
effectively handle complicated and failure data is required. This work benefits from combining the representation 
learning capabilities of deep generative models—in the form of variational autoencoders (VAEs)—with the 
temporal modeling capabilities of long short-term memory (LSTMs) to manage long-term time sequence data and 
generate accurate data based on intrinsic distributions. To train the proposed VAE-LSTM model without 
supervision, the dataset needs to be divided into a training set and a test set, with a continuous segment containing 
no anomalies serving as the training data and the remaining time series containing anomalies used for evaluation 
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Fig. 2: The proposed VAE-LSTM architecture 


The design process of the proposed model is as follows. Firstly, collect the IAQ training dataset without faulty and 
missing readings, and we introduce some fault data and missing intervals (with a fixed length and proportion) on 
the dataset to validate the model performance. In addition to sequences with faulty or missing data as the main 
input of the VAE-LSTM model, sequences of integers that encode the additional information provided as 
categorical features, such as month, weekday, hour, and holiday, serve as the second meaningful inputs. Due to the 
characteristics of buildings with significantly varying occupancy patterns at different hours, and to make the 
predictive model more concise, the variable Hour is taken as additional information for IAQ sequence 
reconstruction. The original IAQ data and the categorical feature which is transformed by the embedding operation 
are concatenated into the LSTM layer to capture the relationship between temporal features. The ReLU is selected 
as the activation function of the LSTM layer. The output of the LSTM goes through a dense layer with a non-linear 
activation function. It then generates a 2D output, just like every other encoder in a VAE architecture, which is 
used to approximate the mean and variance of the latent distribution. The decoder takes samples from the 2D latent 
distribution upsampling and then concatenates the generated sequence with the original categorical embedding 
sequence to provide more control over reconstructing the original IAQ sequence. LSTMs and dense layers with 
ReLU activations constitute the rest of the decoder structure. The training of VAE-LSTM adopts the early stopping 
training mechanism to minimize the combination of reconstruction loss and distribution loss. The patience was set 
as 10. Specifically, the training process will end if the model loss does not decrease after 10 iterations. Adam was 
chosen as the optimizer as it provides the best convergence (Kingma & Ba, 2014). The hyperparameters were 
chosen by fine-tuning the VAE-LSTM structure. The best hyperparameters were selected based on their 
performance in fault data reconstruction and missing data imputation. 


The reconstructed sequence is utilised for time series prediction by the LSTM neural network after the abnormal 
data is replaced by the VAE-LSTM output. The prediction module consists of one layer of LSTM and one dense 
layer. Grid search is used to optimise the model architecture and hyperparameters. Input and output temporal 
dimensions are the same. Mean squared error (MSE) is used as the loss function throughout the training process, 
which was performed with 500 epochs, and the Adam optimizer with a learning rate of 0.001. 
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2.4 Validation scenarios 


Since anomalous events are rare, it is usually not feasible to collect sufficient abnormal data for detailed 
characterization. Therefore, to assess the effectiveness of the proposed framework, we designed different types of 
abnormal scenarios with fixed lengths and proportions on the test set, with the key benefit of creating an arbitrary 
amount of abnormal data while using the original data as a ground truth. The downside of this procedure is that it 
may overfit the abnormal data or provide worse results for real anomalies. As a result, only real data is utilized to 
train the model, whereas anomalous data is solely used to evaluate it. 


A detailed explanation of the abnormal scenarios is as follows. Anomalies such as gain and offset of sensor signals 
may arise due to incorrect calibration or mechanical wear over a period of time. We attempt to simulate three types 
of typical sensor faults: 1. Complete failure: the size is assumed to be twice the average concentration of the 
original data; 2. Bias failure: the size is assumed to be twice the original faulty data segment; and 3. Precision 
degradation fault: the size in the temporal dimension is taken as the average and standard deviation of the original 
data. 


3. RESULTS AND DISCUSSION 


We now employ the methodology described in Section 2 for the collected sensor data for analysis. Table 1 presents 
a Statistical summary of the data in this study. Real faulty data are utilised to evaluate the effectiveness of variance 
analysis in early failure detection. Abnormal data scenarios are then introduced to evaluate the reconstruction and 
imputation performance of the proposed method against other approaches. 


Table 1: The basic statistics of variables. 


Attribute Content 

Variable CO: concentration 

Time period From 2021/11/08 13:00 to 2022/02/19 22:00 
Unit ppm 

Resolution Hour 

Mean 508.56 

Minimum 412 

Maximum 844.7 

Standard Deviation 77.67 


3.1 Early failure detection analysis in IAQ measurements 


We applied variance analysis to analyze the failure of the CO2 sensor in the indoor ventilation control system, 
which resulted in abnormal changes in CO2 concentration up to 1000 ppm instantaneously. Therefore, the purpose 
of applying variance analysis in this study is to detect this anomaly before it occurs. Figure 3(a) shows one week 
of CO: data containing the failures, with an abnormally high CO2 concentration. 


The analysis results are shown in Figure 3, where the collected CO, data is presented together with the analysis 
results. For convenience, the variance on the Y-axis is represented on a logarithmic scale. We used different 
windows to obtain early failure signals, with a 14-hour window when applied to CO2 data and a 23-hour window 
when applied to the reconstruction error. The choice of window size is based on the clarity of the provided signal. 
It is evident from the figure that the variance results for the normal data are periodic, while unexpected fluctuation 
patterns appear before the failure. When the variance is applied to the reconstruction error as shown in Figure 3(b), 
the failure signal is generated about 12 hours prior to the failure, and the reconstruction error gradually increases, 
which shows the unexpected fluctuation pattern before the failure. Therefore, early failure signals give time for 
the maintenance engineers to make the necessary adjustments and repairs before the failure actually occurs. 
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Fig. 3: Variance analysis of the COz2 sensor failure. 


3.2 Reconstruction performance of the IAQ measurements 


As discussed in previous sections, three types of abnormal data scenarios are used to compare the performance of 
the proposed method to that of other approaches. Due to sensor ageing, damage, poor working environmental 
conditions, etc., a number of sensor failures may occur, with complete failure and bias failure being the most 
common. To evaluate the reconstruction performance of AE-based sensor faults, we introduced different kinds of 
faulty data in the test dataset. The magnitude size of each sensor failure is described in Section 2.4. The rate of 
fault data was set to 0.5, and the fault data lasting 3 days were randomly inserted into the test set. Figure 4 shows 
a section of CO2 data containing fault data segments and the reconstructed results. Table 2 demonstrates the fault 
data reconstruction results of the AE-based model for different faulty data. Root mean square error (RMSE) and 
mean absolute error (MAE) are used as metrics to measure the capability of the AE-based model of reconstructing 
the fault data. The largest value of RMSE is 43.197 ppm calculated by the standard AE model. When the encoder 
and decoder structures are designed using LSTM, the reconstruction performance improves by up to 17%, which 
proves that LSTM can capture the nonlinear and autocorrelated relationships of CO2 data. In addition, the 
reconstruction model based on VAE provides better capability for fault data reconstruction. This is because VAE 
can solve the problem of non-regularized latent space in the encoder and provide generation capability for the 
whole space. The encoder of AE produces the vectors in the latent space, while VAE outputs the distribution in the 
latent space for each input, adding a constraint on that distribution to convert it to a normal distribution, and this 
constraint guarantees that the latent space is regularized. As a result, the VAE-LSTM reconstruction accurately 
forces the faulty data to normality. 


Table 2: Reconstruction performance of different approaches under different types of fault data. 


Fault data reconstruction performance 


Reconstruction Complete failure Bias failure Precision degradation fault 
methods RMSE (ppm) MAE (ppm) RMSE (ppm) MAE (ppm) RMSE (ppm) MAE (ppm) 
AE 39.441 28.105 43.197 30.126 39.789 28.696 
AE-LSTM 37.343 26.008 36.882 25.867 36.800 25.814 
VAE-MLP 34.370 24.111 33.228 26.104 32.712 25.087 
VAE-LSTM 32.153 24.508 31.133 18.212 27.133 17.081 
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Fig. 4: Reconstruction performance of the VAE-LSTM model under interval-based fault scenarios 


3.3 Discussion 


This study presents a method to reconstruct IAQ data since abnormal data such as bias failure, complete failure, 
precision degradation fault often occur due to sensor malfunctions. One of the main contributions of this paper is 
the development of the VAE-based model for reconstructing various types of abnormal data, including LSTM 
configurations that properly depict indoor environmental patterns. The time series of CO2 concentration, in 
particular, has periodic peaks and extremes, which is a complexity to consider when developing models employing 
LSTM. Furthermore, model training should be done offline to guarantee that the network has sufficiently learned 
the basic parameters in order to provide optimal learning performance for IAQ data and achieve accurate abnormal 
data reconstruction. 


In addition to the reconstruction of indoor CO: data, the VAE-LSTM model developed can be generally applied to 
other tasks involving periodic abnormal data processing, such as indoor crowd and energy consumption. In fact, 
there is a link between indoor CO? concentration, indoor crowd, and energy consumption. CO2 concentration can 
be generally used as a proxy indicator to assess whether indoor space is occupied and whether indoor crowd affects 
energy consumption. Our proposed VAE-LSTM approach encodes categorical features, such as months, weekdays, 
hours, and holidays into integer sequences as auxiliary information to capture the periodic patterns of time series 
data, and the original categorical sequences are connected to the generated sequences of the decoder to provide 
more control over the process of reconstructing and imputing the sequences. The flow pattern of a human crowd 
and fluctuations of energy demand have similarities with fluctuations in indoor CO2, and both follow a cyclic 
pattern, so our proposed VAE-LSTM method can also be used to constitute a model for processing crowd and 
energy demand from abnormal data. 


In contrast to other studies that only utilize VAE-based models for abnormal detection, this study incorporates 
variance analysis to detect non-periodic abnormal signals in IAQ data in advance, and early failure detection can 
prevent problems in critical parts of the Heating Ventilation and Air Conditioning (HVAC) system. For example, 
the case in this study is the indoor CO2 concentrations at a university restaurant. When abnormal signals are 
detected using our proposed approach, the restaurant manager can contact engineers to check the system in time. 
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Even if part of the system facilities is shut down, the restaurant manager can prepare backup ventilating equipment 
in advance to ensure that the customers can enjoy their meal in a good indoor environment, especially during peak 
hours. 


4. CONCLUSIONS 


A neural approach, consisting of a variational autoencoder and long-short-term memory network (VAE-LSTM), 
was developed for early detection and reconstruction of malfunctioning sensors in HVAC systems in order to 
improve the reliability of the sensors in indoor environment control. Taking advantage of the periodicity and stable 
fluctuation characteristics of IAQ data, the results of variance analysis on reconstruction errors reveal that unusual 
behavior of the data can be detected as early as 12 hours before failure occurs. The abnormal data are then 
reconstructed using the developed VAE-LSTM model. The validation is carried out by introducing different types 
of abnormal data on the CO2 sensor. The superiority of the VAE-LSTM was then illustrated by comparing the 
developed approach to other methods. 


However, for an approach dealing with faulty sensors, an explanatory function of fault locations and causes should 
be provided to the on-site engineer in order to avoid time-consuming proactive repairs, and the knowledge-based 
method or expert rules can meet this requirement and capability. Therefore, in our future work, we will provide 
rational explanations for system failures by combining analytical-based, knowledge-based and data-driven 
approaches and apply them to fault detection and diagnosis of ventilation control systems, especially for large- 
scale building systems. 
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ABSTRACT: This paper aims to quickly and precisely visualize remodeled design images based on image 
generation AI so that they can be used as alternative images in the early stages of design. In order to create a 
space image suitable for the user, the contents of the text are proceeded as follows. Bathrooms with many accidents 
in the space were selected as the target space, and users were designated as elderly people with many physical 
changes. Learning image data for additional training was self-generated according to the user's body 
characteristics, and the learning data focused on musculoskeletal aging among the body characteristics of elderly 
users. When the image was generated using additional training models, it was confirmed that a meaningful spatial 
image was created for musculoskeletal aging users, and it can be expected that the spatial image for spatial 
remodeling can be obtained quickly and accurately without the help of experts through subsequent studies to make 
it easier for general users. 


KEYWORDS: Generative AI, Physical Characteristics, Elderly-friendly Bathroom, Detailed Modeling 


1. INTRODUCTION 


This paper explores the integration of artificial intelligence (AI) technology into image generation during spatial 
remodeling and initial design phases. It aims to automate visualization for promoting safe space utilization by 
considering user's physical characteristics and to investigate various practical applications. Spatial visualization 
plays a crucial role in conveying design concepts and ideas visually to clients. However, generating visualizations 
for architectural spaces requires a significant amount of time and effort, and one alternative is to leverage image- 
generating artificial intelligence. By utilizing image-generating AI, detailed user input regarding spatial 
requirements can lead to the generation of corresponding space images. 


To design safe spaces, it's essential to establish environments suitable for users and based on professional expertise, 
ensuring safety. However, this design process can incur costs such as labor expenses. Nonetheless, using generative 
AI allows for the efficient generation of trustworthy alternative space images by utilizing models trained on 
abundant data. Therefore, this study investigates an extended visualization approach in the field of architecture 
through image-generating AI. It focuses on generating a variety of personalized visualization alternatives for users, 
rather than presenting standardized alternatives. 


2. BACKGROUND 
2.1 Image Generation AI 


The advancement of intelligent computing technology has brought about innovative changes in research 
methodologies and approaches within the field of architecture. Tools such as the architectural design assessment 
tule-checking system (Eastman, et al. 2009) and the spatial data-based building design review system (Lee, et al. 
2012) have also been utilized. However, more recently, the rise of image-generating artificial intelligence (AI) 
technology has introduced significant transformations in the architectural realm. While previous studies primarily 
focused on the application of AI algorithms for predicting and optimizing architectural elements such as building 
appearance and interior composition, the present landscape is marked by image-generating AI technology 
providing fresh perspectives on visual representation and design in architecture. This influence extends not only 
to the architectural domain but also spans various other fields. For instance, within the medical sector, image- 
generating AI has been employed to analyze medical images and medical knowledge-based imagery (Kather, et al. 
2022). Similarly, in the realm of arts, image-generating AI has found utility as a tool for creative artwork generation 
(Beyan, et al. 2023). This multifaceted application underscores the integration of image-generating AI across 
diverse domains, fostering ongoing research endeavors. 


Furthermore, the focus of research has been directed towards leveraging artificial intelligence for performance 
optimization within the architectural context. Models powered by AI algorithms empower architects to experience 
energy efficiency within designs before the commencement of construction. This approach not only streamlines 
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the design process but also contributes to sustainable and user-centric outcomes, fostering anticipations of 
substantial contributions. In essence, the amalgamation of AI technology with architecture is shaping novel 
research directions and enhancing both design methodologies and the conceptualization of architectural spaces. 


2.2 Design of a space reflecting physical characteristics 


Occupants continually modify and inhabit spaces to enhance comfort. Shifting demographics, family compositions, 
and evolving space roles often prompt interior modifications. Precision in incorporating users' physiological and 
psychological data into designs can significantly improve the quality of life by enhancing safety, usability, and 
independence (Demirbilek & Demirkan, 2004). For instance, designing spaces for individuals with disabilities 
necessitates collaboration between experts and users, ensuring their unique needs are met (Imrie, 2004). Similarly, 
creating spaces for children requires considerations such as play areas, noise control, and equipment tailored to 
their needs (Evans & Moch, 2003). Moreover, equipment scale within a space often deviates from conventional 
dimensions to harmonize with users' specific requirements. This departure from standardization not only influences 
the spatial arrangement but also distinctly shapes the manner in which objects are engaged within these 
environments. 


Within this multifaceted framework, it becomes acutely clear that the alignment of spatial configurations and 
amenities with the diverse physiological attributes of users is an imperative. This thematic focus is central to the 
present study, which delves into the meticulous curation of spatial layouts and equipment, meticulously calibrated 
to resonate with the myriad physiological attributes presented by users. This comprehensive endeavor stands as 
the foundation for establishing environments centered around the user, thus aiding in improving the overall quality 
of life. 


3. AI-BASED SPATIAL IMAGE GENERATION 
3.1 AI-Based Image Generation Test 


To enhance satisfaction and facilitate convenient usage of spaces, appropriate improvements are essential. To 
achieve these improvements, a clear understanding of the users of the space is crucial. Furthermore, it is necessary 
to incorporate the layout and facilities of the space based on the users! physical characteristics. To generate spatial 
images based on these physical characteristics and reflect the users' desired points of improvement, a focused test 
was conducted. We tested image generation using the text-to-image functionality of the AI platform named Stable 
Diffusion (SD), which utilizes a Diffusion model. This involves using a deep learning model to generate images 
based on natural language input (Zhang, et al. 2023). In order to utilize SD, relevant prompts need to be formulated 
for the desired images. These prompts are categorized into Positive Prompts and Negative Prompts. Positive 
Prompts are crafted to enhance space types and image quality, while Negative Prompts are designed to prevent 
image degradation or errors. The image generation test focused on residential bathroom spaces, which often 
experience numerous accidents. In this test, not only were typical bathroom images in Korea generated, but also 
images tailored to the physical characteristics of elderly users, in an effort to ascertain if these factors were being 
considered. For this purpose, the generation of bathroom images was performed using image generation AI, with 
specific attention to bathrooms in domestic settings, particularly bathrooms prone to accidents. To ensure the 
consideration of users' physical characteristics, bathroom images that catered to the needs and safety of elderly 
individuals were generated alongside conventional bathroom images. 


Table 1 Generating images by text 


A basic bathroom with a white porcelain toilet, a sink with a chrome faucet, and a standard bathtub 
with a showerhead. The walls are tiled with white rectangular tiles and the floor is covered with grey 


linoleum, Simple, clean, functional, standard, basic, minimalistic, bright lighting, plain, High 
resolution, sharp focus, realistic lighting, standard aspect ratio. 


= 


Standard bathroom 


Positive prompt 


Multiple layouts, avoid bright and clean elements, low quality, bad proportion, normal quality, 


Nee Hye eietD watermark, bad perspective, confusing details, text, blurry 
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Musculoskeletal 
aging user bathroom 


Musculoskeletal Aging, bathroom for the elderly, wall handrail attached, non-slip tile, perspective 
Positive prompt view, wide angle, Simple, clean, functional, standard, basic, minimalistic, bright lighting, plain, High 
resolution, sharp focus, realistic lighting, standard aspect ratio, standard. 


Multiple layouts, avoid bright and clean elements, low quality, bad proportion, normal quality, 


REBAR pioni watermark, bad perspective, confusing details, text, blurry 


3.2 Physical Characteristics-Based Training for Spatial Image Generation 


Upon reviewing the generated images, it is evident that the spatial visualization capability is impressive; however, 
images are being generated without considering user conditions, required expertise, and specific situations. 
Observing the Korean-style bathroom images, the structure of a typical Korean apartment bathroom, including the 
sequence of toilet, sink, and shower booth/bathtub, is not reflected in the generated images. Instead, the images 
are generated with a focus on a single bathroom component, neglecting the holistic bathroom layout. In the case 
of elderly friendly bathroom images for older users, safety facilities should be reflected to prevent accidents when 
the elderly use the space. However, the generated image does not reflect the safety equipment properly, or, even if 
installed, the safety equipment is attached to a space other than the bathroom, generating a facility that is unsuitable 
for space conditions. Consequently, to utilize image-generating AI for generating 


Elderly-friendly bathroom space images, an approach involving the addition of learned images that incorporate 
safety equipment in appropriate positions within the bathroom based on medical expertise and the physical 
characteristics of the elderly is required. 


Input : Text Prompt Image Generation Al Output : Generated Images 


Space type / User physical User Requirements Reflected 
characteristics & requirements Space Image 


Additional Training Model 


Data Preparation —> Hyperparameters Optimization —> Training 


Figure 1 Summary of the configuration outlined in this study. 


4. ADDITIONAL TRAINING FOR DESIGN COMPONENTS VISUALIZATION 


4.1 Data preparation and Pre-processing 


The datasets required for further training should comprise image files along with corresponding text descriptions. 
However, the available training images for Korean-style bathroom structures, obtainable from the current website, 
predominantly consist of wide-angle images to capture confined bathroom spaces. Consequently, even with 
additional training, the generated images might continue to emphasize wide angles or exhibit pronounced 
distortion. Furthermore, when considering bathrooms designed for elderly users, images depicting safety 
equipment installed by non-experts without professional knowledge outnumber those accounting for 
individualized physical characteristics. This discrepancy could potentially result in compromised safety and 
reliability. Therefore, the generation of image data conducive to effective learning is imperative. 
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Given this scenario, it becomes essential to independently generate image data specifically tailored to bathrooms 
for the elderly, considering their unique physical attributes. To accomplish this, a comprehensive exploration of 


Spatial Condition User Condition 
| Musculoskeletal aging 


Fa user's bathroom 
Residential space Bs, f lif | 


i: p| Senior user Bathroom 


( H H i} 


L J$ 
| ‘ Bathroom based on 
l i ia Bathroom x 
\ user's body 
| characteristics 


Figure 2 Scope of Work 


bathroom layouts, incorporating safety equipment based on the aging characteristics of the elderly, is crucial. 
Additionally, the collection of fundamental components necessary for spatial Building Information Modeling 
(BIM) is paramount. In this paper, we have developed an additional training model that centers around 
musculoskeletal aging, a facet of the aging process significantly influenced by environmental factors and exerting 
significant effects. To ensure the safe use of the bathroom, safety equipment that can be installed includes bathroom 
grab bars, floor mats, shower chairs, and more(Gitlin, et al. 1999)(Aminzadeh, et al. 2000). 


Table 2 Information about Properties of BIM Objects 
Height Width Depth 


Height Width Depth 


N Bathroom facilities ED an (aa) N Bathroom facilities (aa) (eh. (iia) 

a See ion 750 470 6 Walk-in bathtub 780 1150 930 
wide to 

2 Toilet wall grab bar 304 685 50 T Draw-out faucet 220 505 273 

3 Washbasin with chair 750 595 455 8 Folding shower chair 424 - 604 

4 Nonslip floor - - 12.7 9 Bathtub/shower grab bar 889 482 50 

5 Toilet side grab bar 300 100 738 10 Shower curtain 1803 107 80 


<Table 2> represents a list of BIM objects for bathroom safety equipment collected for the purpose of BIM 
modeling. This list is a compilation of objects sourced either from the bimobject website, which offers 
downloadable objects for 3D modeling, or generated directly. The collected objects are categorized based on users' 
physical characteristic, bathroom areas, and expected effects. This categorization serves as the foundation for 
crafting content in the training text, enabling the generation of images in accordance with prompts entered by users 
during the image generation process. The contextual text concerning bathroom areas and anticipated effects will 
be utilized in the future for the individualized training of each object when they are added separately to the model. 


Table 3 Categorization of Safety Facilities 
Classification criteria Bathroom facility BIM object 
1 2 3 4 5 6 7 8 9 10 


Aging characteristics A-1. MA e ° ° e ° ° ° e ° e 


B-1 Anti-slip e e 


B-2 Maintaining body temperature e 
Expectation 


š B-3 Smooth movement e e 
effectiveness 


B-4 Supporting device e e ° ° e ° 


B-5 Emergency call facility 


C-1 Basin e ° 
C-2 Toilet e ° 
C-3 Bathtub e ° 
Bathroom area C-3 Shower booth ° ° e 
C-4 Floor e 
C-5 Ceiling 
C-6 Wall 
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4.2 Additional Training 


To facilitate further learning, a dataset for additional training model is established by modeling safety-equipped 
bathrooms suitable for each stage of aging that was previously generated<Figure 3>. This dataset encompasses 


Figure 3 3D Space Modeling and Rendering 


training image data sets and individual text files (txt) associated with each image. These efforts are aimed at 
constructing a dataset for generating additional training models. 


The generation of text files is facilitated through the utilization of BLIP Captioning within the Koha_ss GUI, an 
option found under Utilities. BLIP Captioning enables the individual generation of text files for a substantial 
number of images. For the task of further training, a LoRA model is generated employing Dreambooth-based 
LoRA GUI, Kohya_ss. LoRA, denoting Low-Rank Adaptation, effectively trains on high-quality images. The 
training settings for Dreambooth LoRA include parameters such as Train batch size: 1, Epoch: 120, Learning rate: 
0.0001, Learning rate scheduler: cosine, and Learning rate warmup: 10. The model employs the Stable-Diffusion- 
v1.5 as a pre-trained model. The training environment was executed on a PC equipped with an RTX A6000 GPU 
model. Utilizing prepared training data, the model generation results in a .safetensors formatted model file of size 
144MB. 


5. EVALUATION OF ADDITIONAL TRAINING MODELS 
5.1 Evaluating Enhanced Model Performance 


The manipulation of model weights in its application allows for varying degrees of reflection in the generated 
images. When prompts are entered, the LoRA augmented model, along with its associated weight, can be specified 
for incorporation, as illustrated in <Table 4> presenting the test images. The weight values, ranging from low 0 to 
high 1, facilitate a continuum of image representation. A weight value of 0 corresponds to an image where the 
model has not been applied, thus lacking the reflection of Korean-style bathroom structures and characteristics. 
Conversely, as the weight approaches 1, a gradual integration of Korean-style bathroom imagery becomes apparent. 


Table 4 Generated image result according to model weight 
Weight Generated Images 


0.0 


0.3 
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0.7 


5.2 Utilization Scenarios of the Additional Training Model 


For the purpose of generating bathroom spatial design images based on physical characteristics, further training 
was conducted specifically focusing on age-related musculoskeletal changes. In the context of this study, a 
comparative analysis is carried out between images generated using the trained model and images generated 
without the utilization of the model. The intent is to contrast the images produced by the model-based approach 
with those produced independently, as illustrated in the table below. 


Table 5 Comparing images for elderly users' bathrooms with/without extra models. 


Case A Case B 
F Á Space Bathroom (Residential Space) 
7 Z hysical 
38 User physica : 
3 characteristics Musculoskeletal aging user 
Model N/A Musculoskeletal aging user bathroom (.safetensor) 
Z AIP_Elderly-friendly bathroom, Musculoskeletal aging, senior-friendly, senior housing, safety, fall 
a Posiüveproipt prevention, bathroom, restroom, elderly, senior citizens, elderly users over 70, supporting device, 
H prone device for safe toilet use, nonslip tile, toilet wall grab bar, toilet side grab bar, shower curtain, basin 
chair, folding chair 
Negative prompt Bad quality, duplicate, blurry, bad proportions, confusing details 
O 
=æ 
z Generated image 
= 


Overall, when comparing the generated images of Case A, which were produced without utilizing the augmented 
training model, with the images of Case B generated using the augmented training model, it becomes evident that 
Case A exhibits inaccuracies in the positioning and forms of the attached fixtures. Conversely, for Case B, where 
the augmented training model was employed, no errors are observed in the placement and forms of the safety 
equipment. 


6. CONCLUSION 


In conclusion, this paper explores the integration of AI in spatial remodeling and initial design stages with a focus 
on generating spatial images that reflect user physical characteristics. The objective is to provide users with safe 
spatial images, taking into consideration the user's physical attributes and real-world usage within the space. 


948 


Therefore, this paper targets the bathroom space and selects elderly individuals as the space users, aiming to 
generate safe bathroom images for seniors experiencing physical characteristics related to musculoskeletal aging 
through additional model training. 


To facilitate additional training, suitable safety equipment for the user's physical aging characteristics was 
investigated, and high-quality training data were constructed by generating image data independently. 
Corresponding training text files were created for each image to ensure specific training for the images. As a result, 
it was observed that the images generated using the additional training model had fewer errors related to safety 
equipment compared to using the existing model, and suitable safety equipment was placed within the space, 
demonstrating a different outcome. This cost-effective approach recognizes the potential of AI in the field of spatial 
design, prioritizes a user-centric approach at the intersection of AI and architectural design, and advances and 
improves the design process. In addition to musculoskeletal aging, which was the focus of selecting elderly 
individuals as the target for additional training model creation in this paper, a comprehensive examination of aging 
occurring in various body structures or the selection of a more diverse range of subjects can expand the scope and 
target of additional training. Beyond simple visualization, this enables detailed spatial visualization based on user 
requirements through text input for image generation, which can be expected to be utilized in various fields for 
Al-generated images. 
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ABSTRACT: This paper explores the applicability of Image-generation AI in the field of interior architectural 
design, with a particular focus on automating interior design representation based on design styles. Interior design 
representation involves a complex process that integrates visual elements with functionality and user experience. 
Effectively visualizing this process is essential for facilitating communication among the various stakeholders 
involved in the design process. However, traditional visualization methods are constrained by expert resources, 
costs, and time limitations. In contrast, image-generation AI has the potential to automate various design elements, 
including design styles, components, and spatial arrangements, to enhance representation. In this study, we 
evaluated the performance of a base model using various design styles and, based on the evaluation results, 
selected styles for fine-tuning. The methodology for fine-tuning these design styles involved the following steps: 1) 
data preparation and preprocessing, 2) hyperparameter optimization, and 3) model training and construction. 
Utilizing the fine-tuned model thus constructed, we conducted image generation demonstrations. The research 
results revealed that design styles not well represented by the base model were effectively captured, and high- 
quality images were generated by the fine-tuned model. Notably, this fine-tuned model demonstrated the ability to 
represent images of specific design styles with a high degree of accuracy in capturing the characteristics and 
keywords associated with each style, compared to the base model. This implies that through fine-tuning image- 
generation AI, a wide range of applications can be inferred when aiming to create customized designs by 
considering these aspects. In conclusion, this study explores an efficient approach to interior design representation 
in the field of interior architecture by employing image-generation AI and proposes a method to effectively 
generate visualized images by training on design style keywords. Through this approach, our study can contribute 
to improving the interior design process by facilitating the generation of visualized images that reflect design styles. 
Furthermore, the study aims to suggest the potential for applying this approach not only to the field of interior 
architecture but also across various domains to achieve effective visualization. 


KEYWORDS: Interior Architecture Design, Interior Design Representation, Generative AI, Model Fine-tuning 


1. INTRODUCTION 


Interior design representation plays a crucial role in the field of interior architecture, effectively conveying ideas 
and designs through visual media and facilitating effective communication among various stakeholders involved 
in the design process (Chiu, 1995). In interior spaces, design styles signify the approach and method of planning 
and decorating a space, shaping and emphasizing the aesthetic, functional, and psychological aspects of the space. 
Design styles encompass a variety of preferences and trends influenced by the users and purposes of the space, 
impacting choices in color, patterns, materials, furniture, and accessories (Goldschmidt et al., 1998; Eckert et al., 
2000). Additionally, they serve as a means to reflect individual identity and lifestyle, reflecting personal 
preferences and tastes. 


Therefore, understanding and proposing customized designs that consider user preferences in the spatial 
visualization process is essential. However, this process necessitates expertise to comprehend the diverse 
preferences and requirements of users, as well as the desired design styles and spatial elements for visualization. 
This requires a significant investment of time, cost, and effort for both experts and non-experts (Lee et al., 2020). 


Recent advancements in deep learning technology have sparked significant interest in generative artificial 
intelligence (Gen AT). As a result, various research endeavors are underway in the realm of visual content creation 
using image-generation AI (Image-Gen AI) based on large language models (LLMs). Expanding upon this trend, 
our study aims to propose an approach for automating interior design representation using Image-Gen AI. This 
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approach allows for the generation of diverse design visualization alternatives based on user preferences and 
objectives, all without the need for specialized expertise. 


Design Style 


Keywords | 
Interior Ee 
Design Styles 4 


Image-Gen AI 


Fig 1: An Overview of the Study 


Interior Design 
Representation 
for Spatial Visualization 


2. BACKGROUND 
2.1 Deep Learning-based Image-Gen AI 


Image-Gen AI is based on deep learning and is a versatile technology applicable in various fields, including natural 

language understanding, computer vision, image processing, data generation, prediction, and more (Liu et al., 2021) 
This technology is used to generate new image content or outputs based on given data or information. To 

accomplish this, Image-Gen AI is pre-trained on large datasets and then fine-tuned for specific targets. During the 

training phase, Image-Gen AI learns features and patterns from the data, and in the generation phase, it uses this 

learned information to generate new image data. This process can be considered an example of transfer learning, 

allowing effective generation results even with limited data. Image-Gen AI can be trained to incorporate additional 

conditions such as image type, style, color, and more, enabling it to generate images that meet specific criteria. 

This reduces the need for extensive training on the target while enabling various applications like style 

transformation, ensuring similarity between images, image synthesis, and more (Nichol et al., 2021). 


For these reasons, recent applied research efforts are being conducted in the field of visual content creation using 
a variety of Image-Gen AI models such as Midjourney (Oppenlaender, 2022), DALL-E 2 (Ramesh et al., 2022), 
Stable Diffusion, and others (Ramesh et al., 2022; Saharia et al., 2022; Rombach et al., 2022; Oppenlaender, 2022). 
While research using Image-Gen AI has been extensive, it remains limited in the field of interior architecture. 
Therefore, in this study, we aim to explore an approach to fine-tune Image-Gen AI models based on design styles 
and implement a model for the automatic visualization of interior design representation. 


2.2 Potential for Automating Interior Design Representation through Image-Gen AI 


In the field of interior architecture, the evolution of deep learning technology is reshaping the way spaces are 
conceptualized and realized. In the past, designers relied on manual sketches and 2D drawings to convey ideas for 
spatial visualization reflecting interior design representation (Ching, 2011). However, with the emergence of 
advanced technology and computer-aided tools, spatial visualization has undergone a paradigm shift (Karras et al., 
2018). The integration of sophisticated software, computer-aided design, and 3D modeling tools has empowered 
designers to visualize spaces realistically and immersively. Thanks to these advancements, designers can 
accurately represent intricate details such as lighting, materials, textures, and shadows, not just the physical layout. 
As a result, stakeholders, including clients and project collaborators, can experience the proposed design in a 
lifelike manner before actual construction commences (Ah-soon & Tombre, 1997; Oxman, 2006). 


With the continuous advancement of deep learning technology, Image-Gen AI can effectively generate images that 
match the intended target by fine-tuning a base model pretrained on large datasets. Leveraging these characteristics 
of Image-Gen AI, it is possible to implement an interior design representation model based on specific design 
keywords, fine-tuning it to reflect the desired design style. This model can be utilized as a tool to generate a variety 
of design alternatives that users desire during the design process and enhance the decision-making process (Jeong 
& Lee, 2023). Therefore, in this study, we aim to conduct fine-tuning of design styles on Image-Gen AI and explore 
methods for automating interior design representation. 
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3. MODEL FINE-TUNING FOR INTERIOR DESIGN REPRESENTATION 


3.1 Overall Process 


In this study, the image generation performance of three major AI platforms in the Image-Gen AI field, Stable 
Diffusion, DALL-E 2, and Midjourney, was examined and compared. First, these Image-Gen AI platforms employ 
two main methods: text-to-image (Txt2img) and image-to-image (Img2img). Each of these platforms has unique 
and distinctive image generation capabilities, along with their specific technical attributes, strengths, and 
limitations. 


DALL-E 2, trained on a large-scale image dataset, demonstrates exceptional abilities in generating detailed and 
complex images. However, due to its complexity and resource-intensive nature, it may have longer processing 
times, and its dependency on text prompts might result in shortcomings in image stability and consistency. 
Midjourney excels in generating images inspired by specific visual styles or artistic aesthetics. However, its 
emphasis on artistic expression may result in relatively less accurate representation of real objects or scenes. Stable 
Diffusion prioritizes stability and accuracy in image generation. It is based on LLMs and provided as an open- 
source platform, making it user-friendly for customization. The transparency of the source code allows users to 
understand, modify, and apply the underlying algorithms, enabling them to verify, align with their intended 
purposes, and enhance image generation results. 


Each platform has its unique strengths and specific limitations. Considering these factors, this study utilized Stable 
Diffusion, which demonstrated the most outstanding performance in terms of image generation stability, accuracy, 
and the ability to cater to specific requirements through open-source access. The research methodology, rooted in 
the utilization of Stable Diffusion, is structured into three primary phases. Step 1: Testing image generation via 
text descriptions for design styles, Step 2: Model Fine-tuning, Step 3: Evaluation of Fine-tuned Models. The 
schematic representation of the entire process outlined in this section is depicted in Fig 2 below. 


Step 1 Step 2 Step 3 


Interior Design Styles 


> Training Dataset Image-~Gen AI 
anne en, ea "i 
4 ' Fine-Tuned Model 
: ; Y 
Image-Gen AI Image-Gen Al 
The Default Model The Default Model 


v v y { Interior Design 
: Representation 
Superior Inferior 5 ` : | Sie r 
Recognition Recognition Í Fine-Tuned Model # for Spatial Visualization 


Fig 2: The Process of Model Fine-tuning 


3.2 Step 1: Testing Image Generation via Text Descriptions for Design Styles 


In this section, we conduct tests to evaluate the performance of the Image-Gen AI model for various interior design 
styles. Our study focuses on the text-to-image (txt2img) approach, generating images based on text descriptions. 
To prevent the Image-Gen AI model from inferring styles solely from text descriptions, we perform prompt 
engineering adhering to guidelines. This prompt consists of two key components: 1) prompts for the target design 
style, and 2) prompts for image quality. Additionally, we categorize positive and negative aspects that encompass 
both reflective and non-reflective elements of the generated images. 


The image generation process utilized the DPM+2M Karras sampler along with the widely used open-source 
model, SD1.5V checkpoint (v1-5-pruned.ckpt). Essential configurations, including sampling steps and CFG scales, 
were set to default values, and the image size was defined as 1024x512 pixels. Each image generation took an 
average processing time of approximately 5 seconds. Table 1 below illustrates the configuration settings employed 
in the image generation process, and Table 2 showcases the standard format of the text prompts utilized for image 
generation. 


Based on the preceding discussion of the image generation process, we conducted an evaluation of the recognition 
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level for interior design styles using the generated images. Table 3 below provides outcomes of the generated 
images based on their recognition levels. The base model of Stable Diffusion demonstrated a stable generation 
capability for high-quality images across most styles. However, its expressive capacity was relatively limited in 
terms of being recognized as specific design styles, particularly due to lower comprehension of certain styles. To 
address these constraints and enhance image generation accuracy, we deduce that fine-tuning for specific targets 
is imperative. 


Table 1: Configuration Settings for Image Generation 


GPU Base model Sampling Method Sampling steps CFG Scale Resolution 
A6000 1024x512 
47.5 VRAM SD v1.5 ckpt DPM+ 2M Karras 25 13 (2:1) 


Table 2: The Standard Format of Prompts Used for Image Generation 


Prompts Positive Negative 


Design Style Design Style interior, Space zoning None 


Professional photograph, photorealistic rendering, 
realistic, enhance-detail, v ray rendering, full HD, | Bad proportion, Low quality, awkward shadows, 
masterpiece, highly detailed, high quality, 8k, full | unrealistic lighting, pixelated textures, Worst, noisy, 


Image Quality shot, deep depth of field, f/22, 35mm unrealistic reflections, normal quality, watermark 


Table 3: Sample Results of Generated Images based on Recognition Levels 


Recognition Level Superior Inferior 


Image-Gen AI 


based generated Images “Industrial Style Interior, A living room” “Brutalism Style Interior, A living room” 


3.3 Step 2: Model Fine-Tuning for Design Styles 


In this section, the process of fine-tuning the model for the target design style (ex. Brutalism) encompasses three 
key steps: 1) Data Preparation, 2) Hyperparameter Optimization, and 3) Training. 


Firstly, in the Data Preparation step, we focused on the collection and preprocessing of data tailored to the specific 
target design style. This Training Dataset requires two primary components: Image Data and Text Data. Table 4 
provides an example of the training dataset. 


Table 4: An example of the training dataset / content (e.g., a space), style (e.g., a design style) and scene description. 


Image Data Text Data 


“A Brutalism style interior in a Living room with sharp lines that exemplify the brutalist aesthetics. 
Featuring shades of grey, concrete, and metallic tones, it showcases a minimalist living room 


characterized by grey hues and a monochrome color scheme.” 


The subsequent stage involved the optimization of the model's hyperparameters to elevate its image generation 
performance for the specified design styles. This process encompassed the refinement of parameters like learning 
rate, batch size, and network architecture to attain improved outcomes. Table 5 presents the hyperparameters 
employed during the model fine-tuning. 
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Table 1: Optimized Hyperparameters for Model Fine-tuning 


Epoch Batch size to Learning rate Learning rate 
Training data (Training steps) train Learning rate Scheduler warmup 
15 100 1 0.0001 Constant 10 


The ultimate training phase involved the model being subjected to training using the meticulously prepared dataset 
and meticulously optimized hyperparameters. The process of model fine-tuning was executed using the training 
dataset and specified hyperparameter configurations to create the target model. The training duration amounted to 
approximately 25 minutes. Through this training, the model was tasked with comprehending the distinct attributes 
and intricacies of the designated design style, consequently empowering it to craft images that exhibit a heightened 
alignment with the intended aesthetic. This fine-tuned model is utilized in conjunction with the default model 
during subsequent image generation endeavors. 


3.4 Step 3: Evaluation of Fine-tuned Models 


Through the subsequent fine-tuning process, the model learned from both image and text data related to design 
styles that initially exhibited inferior recognition. As a result, it was enhanced to effectively depict the distinctive 
characteristics of the trained design styles, showcasing a heightened ability to generate high-quality images. Table 
6 below illustrates the results of image generation based on the application or absence of the fine-tuned model. 
This highlights the tangible impact and comparison of interior design style image generation that was not 
achievable before the model's fine-tuning adjustments. Additionally, the weights of the model parameters 
correspond to finer adjustments made to the model, reflecting a more distinct influence on the image generation 
process. 


Table 6: Qualitative Comparison of Image Generation: Impact of Applying Fine-Tuned Model 


Weight of 
Fine-Tuned Model at W 30% at W 60% at W 90% 


Image-Gen AI 


based generated Images “Brutalism Style Interior, A living room” 


4. DEMONSTRATION 


In this study, based on the procedures outlined in the previous Section 3, a demonstration was conducted using 
Image-Gen AI to generate images of more than 15 interior design styles. The target space was limited to residential 
living rooms. The generated image results for each design style are presented in Table 7, encompassing both the 
default model and the fine-tuned model for image generation. This demonstration facilitates practical comparisons 
in the interior space visualization process through the generation of design alternatives, as proposed in this study. 
Additionally, it allows for the observation of the potential for learning various customized design styles. 


Table 7: Image Generation using Default and Fine-Tuned Models 


Design Style Al-based generated Images 


Modern 


Contemporary 
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Industrial 


Scandinavian 


Bohemian 


Rustic 


Hygge 


Maximalist 


Shabby 


Provence 


Art Nouveau 


Oriental 


Colonial 
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5. CONCLUSION 


In this study, we investigated a method for fine-tuning design styles using Image-Gen AI and automatically 
generating spatial images reflecting interior design representation. During the research process, we conducted an 
evaluation of the recognition level for interior design styles by the base model. We implemented a design style 
visualization model based on detailed keywords for the Brutalism style, which was chosen as one of the fine- 
tuning targets. The model effectively learned the characteristics of the style and demonstrated the ability to 
intricately represent the visual attributes of the style. 


Through comparative analysis with the base model, we confirmed the high likelihood of visualizing the features 
of the style, thus validating the capability to effectively visualize spaces that align with user preferences through 
additional fine-tuning for interior design styles. Furthermore, this research approach showcases the potential for 
Image-Gen AI to be utilized in various fields, and we aim to suggest its applicability in future research and 
application domains. 
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ABSTRACT: This paper presents the potential utility of generative artificial intelligence-based light analysis 
simulation visualization image in the early phase of architectural planning and design. Facilitating the simulation 
of a building's performance during the early stages of planning and design presents numerous advantages, such 
as cost savings and enhanced ease of communication among stakeholders. However, the assessment of design 
performance is typically conducted during the design development phase or post-design completion. Processing a 
substantial volume of data based on design alternatives demands considerable time and resources, thus 
constraining the immediate provision of simulation results. This paper aims to utilize generative AI to produce 
visualization results of simulations with a predefined level of accuracy, with a specific focus on the architectural 
aspect rather than the physical and engineering functionalities of the simulation. Consequently, the study employs 
the following approach: 1) Analyze prominent characteristics and elements within light analysis simulation. 2) 
Based on this analysis, generate high-quality visualization image data additionally through Building Information 
Modeling (BIM). 3) Construct a dataset by pairing the generated lighting analysis visualization image with 
prompts. 4) Utilize the established dataset to create an additional learning model for light analysis visualization 
images. This study is expected to provide immediate and efficient assistance in design decision-making during the 
early phases by generating visualization images with high accuracy, reflecting prominent qualitative aspects 
related to light analysis and processing within the simulation. 


KEYWORDS: Architectural Design, Architectural Visualization, Generative AI, BIM (building information 
modeling), Fine Tuning Model 


1. INTRODUCTION 


This study aims to utilize generative artificial intelligence (AI) to create and employ light analysis visualization 
images within architectural spaces. The current simulation methods quantitatively derive predictive outcomes 
based on physically designed environmental conditions, which are then visualized. However, as the number of 
design alternatives increases, processing extensive data incurs time and cost, posing a limitation, particularly in 
promptly delivering results during the design phase. In the initial design stages, swift generation and evaluation of 
various design alternatives are vital to meet given requirements. During this process, offering intuitive 
visualization results rapidly proves more effective than ensuring the precision of simulation outcomes. Therefore, 
this research is conducted with a primary focus on architectural visualization, which aids in the early design phase, 
rather than solely relying on physical and engineering-based imagery. In the initial design phase, the precision of 
the design model is diminished due to the uncertainty of design conditions. However, the utilization of this 
technology enables straightforward assessment of visual performance aspects, such as a building's energy 
efficiency and lighting environment, even at the conceptual model level. Particularly, these visualizations serve as 
effective tools for comparing and evaluating various design alternatives, fostering communication among 
stakeholders. 


Building upon this foundation, a test of the potential for light analysis visualization images in architectural spaces 
is conducted using image-generating AI. However, the generated images lack reflection of the elements and 
characteristics of light analysis visualization within spaces, thus clearly indicating the need for further refinement 
through additional training. For the purpose of generating light analysis images using AI, a process involving '1) 
Setting the Scope of Light Analysis, 2) Data Preparation, and 3) Training' is carried out. Representative elements 
of general characteristics from light analysis images are chosen to define the range of training data generation. 
Model construction employs a diffusion-based model implemented based on the Large Language Model (LLM) 
for additional training, and hyperparameters are adjusted to ensure the generation of high-resolution images. The 
constructed supplementary training model is demonstrated in real-world applications of image-generating AI, such 
as the creation of light analysis visualization images based on different time periods. 
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2. BACKGROUNDS 
2.1 Image Generation Artificial Intelligence (AI) 


The recently emerged generative AI paradigm has entered its preliminary stages, yet it wields substantial influence 
across diverse industrial sectors. It is anticipated that with ongoing technological advancements, this nascent field 
will expand the horizons of innovation [Mackinsey, 2023]. Image generative AI such as Stable Diffusion and 
DALL-E have instigated substantial transformations akin to sectors beyond, including art and entertainment, 
within the domain of architectural design as well. However, the latent potential within the realm of architectural 
design visualization remains notably underdeveloped. 


This study aims to propose a novel approach to architectural design visualization through the utilization of image 
generative AI models. Architectural design methodologies intertwined with AI offer a departure from conventional 
design practices that have traditionally relied upon designers’ creativity and expertise to address multifaceted 
requirements. By systematizing the data generated and incorporated during the design and construction processes 
through automated tools, the objective is to assist in resolving ambiguities, risks, and other issues that might arise 
in human-executed tasks by architects, contractors, and related stakeholders. 


Given the advancements in LLM models and image generation technologies, the capacity to generate architectural 
visualization images founded on provided textual input has now materialized. Termed as the process of text-to- 
image generation, this procedure holds the capability to engender highly realistic images, thereby serving as a 
multifunctional instrument for the creation of an extensive gamut of architectural visualization content. In light of 
the continued evolution of AI technology, text-to-image generation is anticipated to assume a pivotal role within 
the architectural domain. Consequently, image generative AI augments the prospects of creative potential beyond 
conventional methodologies. 


2.2 Potentials of Generative AI on Architectural Visualization 


Architectural visualization plays a crucial role in effective communication during the design process due to the 
intricate nature of design and spatial characteristics [Chiu, 1995]. Notably, architectural visualization techniques 
such as 3D modeling provide a comprehensive understanding of spatial relationships [Eastman, 1999]. They serve 
as essential tools for visually expressing complex designs and enabling clear communication with clients and 
stakeholders. This facilitates the facile comprehension and assessment of project concepts and designs, enabling 
the early identification of design flaws, leading to cost and time savings and enhancing satisfaction. In the initial 
stages of design, they prove particularly effective for comparative analysis and review of various design 
alternatives. Historically, the generation of architectural visualization images required specialized hardware such 
as GPUs, along with the utilization of dedicated architectural software. This demanded a significant investment of 
time and effort, ranging from conceptual design configurations to comprehensive design processes. However, with 
the advent of generative AI, the landscape has transformed. Now, it is possible to efficiently create numerous 
architectural visualization images with high-performance GPUs, without the necessity for separate platform 
installations. Through web browsers, one can seamlessly generate highly detailed, high-quality visual images using 
text-based commands (prompts). This transformation marks a paradigm shift in architectural visualization, 
affording designers an unprecedented level of efficiency and versatility in the creation and communication of their 
spatial visions. 


3. INTENSIVE TEST USING GENERATIVE AI FOR SPATIAL LIGHT ANALYSIS 
VISUALIZATON IMAGE 


3.1 Image Generation Test for Spatial Light Analysis Visualization Image 


In this study, an examination was conducted to assess the feasibility of employing generative AI, utilizing the 
open-source image generation model "Stable Diffusion" (2022, Stability AI), for the immediate generation of 
spatial light analysis visualization images. During the testing phase, emphasis was placed on conducting 
comparative analyses of text-image generation and visualization performance. Furthermore, the scope was 
delimited to prioritizing visual effects and approximate simulation result visualization, rather than accuracy and 
precision. While there are two main approaches for Al-assisted image generation, namely, 1) image-to-image 
(img2img) and 2) text-to-image (txt2img), the latter method of using AI to generate architectural schematics was 
adopted for testing purposes. 


Prompts were formulated under four categories: 1) Scene Description, 2) Geographic Location, 3) Image Quality, 
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and 4) Light Analysis Conditions. The generated images were set to a resolution of 1024x512 pixels. The testing 
was conducted using the SD model within a local PC environment. A total of 2,000 images were generated through 
200 images per testing scenario. On average, it took approximately 5 seconds to generate a single image. 


Prompt ] f 
_—$—$> $$ | | 
Positive Prompts: Light analysis view PEP 
Energy simmulation, 7am., dawn 
Negatrve Prompts: low quality, poor 
= | Sean 
ji 


Fig. 1: Procedure for Spatial Light Analysis Visualization Image Generation Test 


j 


proportions, normal qualsty, bad detail, — 
blurry, foggy, bad quality shadow — 
unreal engine, sketch ugly, prvelated 
watermark, people, pet 


3.2 Results of Spatial Light Analysis Visualization Image Generation Tests 


Based on the results of the previously conducted spatial light analysis simulation visualization tests, the current 
model has been found unable to generate interior light analysis visualization images of spaces. The lighting 
simulation visualization images generated through existing image generation models did not exhibit the 
characteristics of typical simulation images produced using simulation tools, revealing two key issues. Firstly, the 
model failed to recognize objects composing the space such as windows, ceilings, and walls, as well as lighting 
fixtures; hence, properties like shading and luminance were not accurately reflected. In essence, these simulation 
visualizations did not consider the lighting analysis environment. Secondly, there was a lack of consistency in the 
simulation image outputs, indicative of the absence of defined methods for visualizing quantitative lighting 
analysis outcomes (view type, visualization style). Consequently, the comparison of design alternatives under 
uniform conditions became unfeasible. To address these challenges, it is imperative to undergo additional training 
using lighting analysis images that incorporate the visualization elements and attributes pertinent to lighting 
simulation. While the existing model proves efficient in generating images across a wide range of domains, 
enhancing the model's capabilities through additional training is essential for tailored image generation in specific 
fields due to the constraints posed therein. 


4. ADDITIONAL TRAINING FOR VISUALIZING LIGHT ANALYSIS 


For the purpose of Al-driven light analysis image generation, an approach involving the following processes: 1) D 
Definition of the scope of Light Analysis, 2) Data preparation, and 3) Training, is proposed. 


4.1 Definition of the scope of Light Analysis 


This study focuses on indoor lighting visualization images achievable during the initial stages of interior 
architecture design through AI methodologies. The light analysis is applicable across the first three stages of design 
elaboration as outlined in ISO16817 (Project definition — Conceptual design schematic design — Detailed design — 
Final design). Given that decisions made during the initial design phase significantly influence the subsequent 
design process direction, preemptively understanding the potential impact of initial design decisions holds 
paramount importance [Kalay, Y. E. (2004)]. 


4.2 Data preparation 


During the stage of Data Preparation, meticulous consideration was given to the types of training data, the extent 
of generative scope, and the methods of data creation. Through the utilization of Building Information Modeling 
(BIM) and rendering techniques for lighting simulation imagery, a process was employed to define the range of 
light influence elements within the visualization components of lighting simulation, thus facilitating the generation 
of training data comprising Light Analysis simulation images for indoor spaces. 


4.2.1 Categorization of Training Data and Scope of Generation 


Within the framework of this study, the scope for generating training data was determined based on considerations 
encompassing spatial design elements, lighting design components, and visualization techniques. Spatial 
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dimensions were confined to the living room, accounting for spatial design elements such as ceilings, floors, walls, 
and windows (with respective sizes). Lighting design elements were delimited to natural light (primary light 
source) and specific timeframes (7 a.m., 12 p.m., 6 p.m.), with sunrise at 7 a.m., noon at 12 p.m., and sunset at 6 
p.m. The visualization techniques were confined to interior views and photorealistic representations, serving as 
the basis for generating the training dataset. 


4.2.2 Methodology for Generating Training Data and Illustrative Cases 


For the process of generating training data, the BIM software named "Revit" was employed to execute interior 
space (living room) BIM modeling. Subsequent to this, the Revit plug-in program known as "Enscape" was utilized 
to generate light analysis simulations and rendering images of the modeled interior space. The ensuing outcome 
images arising from these procedures have been presented in the table indicated by the respective table number. 


Table 1: BIM Rendering Image for Additional Training 


Training Data Type Training Data Image 


Light analysis view 
render image_7a.m. 


Light analysis view 
render image_12p.m. 


Light analysis view 


render image_6p.m 


4.3 Training 


The Additional training was carried out on a local PC equipped with an RTX A6000 GPU model boasting 47.5GB 
of memory capacity. Two distinct methodologies were employed for additional training: 1) Fine-tuning of the 
Stable Diffusion (SD) model using the Dreambooth approach, and 2) Training of the SD model using the Low- 
Rank Adaptation of large Language Models (LoRA) technique. LoRA, a technology employed for image 
generative AI fine-tuning, facilitates the creation of additional training model files in a brief time frame without 
necessitating intensive GPU performance. LoRA enables few-shot learning and offers the advantage of promptly 
and easily observing the impact of styles by altering model file weights. The resultant “. safetensors" LoORA model 
files, generated upon completion of additional training, can be copied and utilized on other devices. 


Fine-tuned models were developed, categorized into cases of fine-tuning the SD model itself and fine-tuning the 
SD model with the application of LoRA. Each category encompassed three learning types corresponding to 
different time frames influenced by natural light (7 a.m., 12 p.m., 6 p.m.). Rigorous hyperparameter configuration 
and combinations were systematically implemented to facilitate precise additional training. Hyperparameter 
optimization enhanced the quality of generated training model images. The array of hyperparameters considered 
in this study ranged from image size, batch size, epoch, Caption Extension, learning rate, learning rate scheduler, 
to learning rate warmup. Given the utilization of LLM-based models for additional training, the process 
encompassed engineering and pairing image and text (prompt) data. Prompts were classified into Positive prompts 
and Negative prompts, based on their application status. 


Table 2: Configuration Values of Hyperparameters for Additional Training 


Model Training data 


Fine tuning 
2 Prompt text 
Base Fine-tuned 
hyperparameters 


Image type Number 


Model Model Positive prompt Negative prompt 
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Light analysis view, Interior Train Batch 
Light analysis 
image, Revit, Enscape, p Size: 2 
Model 1 view render 10 low quality, poor 
Render image, Energy ; Epoch: 150 
image_7a.m. quality, bad 
simulation, 7a.m., dawn Caption 
proportions, gross 
. a ; ; Extension: .txt 
Light analysis view, Interior] proportions, normal 
V1-5- Light analysis Learning 
image, Revit, Enscape, quality, bad detail, 
puned Model 2 view render 10 rate:0.001 
Render image, Energy blurry, foggy, bad 
ckpt image_12p.m. Learning Rate 
simulation, 12p.m., noon |quality shadow, unreal] 
: Scheduler: 
engine, sketch, ugly, 
Light analysis view, Interior Constant 
Light analysis pixelated, watermark, 
image, Revit, Enscape, 
Model 3 view render 10 people, pet Learning Rate 
Render image, Energy 
image _6p.m Warmup: 10 
simulation, 6p.m., dusk 


5. OUTPUTS OF ADDITIONAL TRAINED MODEL 


Within the realm of Al-driven image generation, two primary approaches exist: 1) Text-to-Image (txt2img), and 2) 
Image-to-Image (img2img). The txt2img approach generates architectural visualization images based on given 
textual descriptions. This process demands precision in design due to its sensitivity to factors such as the words 
utilized, accurate descriptions, and word arrangement. Nonetheless, this approach offers the advantage of 
efficiently generating realistic images. In contrast, the img2img approach provides the functionality to manipulate 
and enhance images or photographs, facilitating their reprocessing and utilization. In this section, we aim to 
demonstrate the outcomes of the trained additional models using both of these image generation approaches, 
showcasing their capabilities based on the previously conducted additional training. 


5.1 Generation of Visualized Images Using the Text-to-[mage (txt2img) Approach 


To facilitate a comparative analysis between pre and post additional training outcomes, the identical prompts 
utilized during the preceding intensive test were employed. Prompts were categorized into Positive prompts and 
Negative prompts based on their application status. Positive prompts were further classified into Scene Description, 
Geographic Location, Image Quality, and Light Analysis Condition, generating visualization images of spatial 
light analysis tailored to the characteristics defined by these prompts. 


Table 3: Image Generation from Txt2img Approach 


Model Light analysis view render Light analysis view render Light analysis view render 
image _7a.m. image 12p.m. image 6p.m. 
Negative low quality, poor quality, bad proportions, gross proportions, normal quality, bad detail, blurry, foggy, bad quality 
prompt shadow, unreal engine, sketch, ugly, pixelated, watermark, people, pet 
Scene Description 
: Light analysis view, Interior image, Revit, Enscape, Render image, Energy simulation, Livingroom, large window 
Geographic location (gps location data,country) 
Positive : 50, Yonsei-ro, Seodaemun-gu, Seoul, Republic of Korea 
prompt Image quality: low quality, poor quality, bad proportions, gross proportions, normal quality, bad detail, blurry, foggy, bad 
quality shadow, unreal engine, sketch, ugly, pixelated, watermark, people, pet 
Light analysis condition: 7a.m. shiny Light analysis condition: 12p.m. Light analysis condition: 6p.m. shiny 
outside shiny outside outside 
Output 
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5.2 Generation of Visualized Images Using the Image-to-Image (img2img) Approach 


Table 4: Image Generation from Img2img Approach 


Input 


Scene Description 
: Light analysis view, Interior image, Revit, Enscape, Render image, Energy simulation, Livingroom, large 


window 


Prompt Geographic location: (gps location data,country) 
50, Yonsei-ro, Seodaemun-gu, Seoul, Republic of Korea 


Image quality: low quality, poor quality, bad proportions, gross proportions, normal quality, bad detail, 
blurry, foggy, bad quality shadow, unreal engine, sketch, ugly, pixelated, watermark, people, pet 


Light analysis condition: 12p.m. shiny outside 


Output 


6. CONCLUSION 


According to the approach proposed in this study, the utilization of Image Generation AI has yielded the capability 
to generate visualized light analysis images within spatial contexts during the initial design stage, obviating the 
need for simulation processes. Notably, even at the nascent phase characterized by mere design concepts, the 
potential for inferring light influx within a space based solely on text descriptions related to these design concepts 
was evident. This approach has been substantiated by showcasing its capacity to transform not only text but also 
3D rendering images into light analysis visualization images. It is worth noting that this study, with its emphasis 
on visualization from an architectural perspective over a purely engineering one in the light analysis simulation 
process, may have certain limitations. Nonetheless, this approach harbors the potential to construct models that 
yield more precise and accurate light analysis outcomes. Particularly focused on the Stable Diffusion model and 
conducted as an exploratory endeavor into the architectural visualization potential through Image Generation AI 
at its nascent stages, there lies an opportunity to derive more intricate results through the utilization of diverse 
generative AI programs and supplementary functionalities. By incorporating these components, the potential for 
producing even more detailed outcomes is substantial. Furthermore, through the integration of specific 
requirements of light analysis simulations and domain knowledge-based additional training, there lies the potential 
for enhancement, affirming the robust potential of Image Generation AI. This study underscores the substantial 
potential of Image Generation AI by emphasizing its ability to explore the possibilities of architectural 
visualization. 
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ABSTRACT: Directed energy deposition (DED) is a major metal additive manufacturing (AM) technology that is 
increasingly used in many industries due to its ability to manufacture complex components of arbitrary shapes and 
sizes. However, a lack of timely geometry assessment and the consequent geometry control hinders the 
development of DED towards zero defect manufacturing. In this study, a real-time geometry assessment 
methodology is developed for laser pow-der directed energy deposition (LP-DED). A geometry assessment system 
is developed using a laser line scanner capable of inspecting the melt pool area, the just solidified area, as well as 
layer-wise inspection. An image processing method with an encoder-decoder based profile completion network 
was developed to obtain accurate track profile in images from real-time inspection. Experiments have been 
conducted to validate the proposed methodology by depositing multi-layer X-shape objects. 


KEYWORDS: Additive Manufacturing, Directed energy deposition, Real-time geometry assessment, Laser line 
scanning 


1. INTRODUCTION 


Metal additive manufacturing (AM) technologies’ potential to revolutionize the manufacturing industry has not 
only been well-recognized but has inspired many fields [1]. According to the American Society for Testing and 
Materials (ASTM) standard [2], the two main groups of met-al additive manufacturing technologies are directed 
energy deposition (DED) and powder bed fusion (PBF), both of which have been widely applied in, for example, 
the aerospace, automobile, and biomedical industries. In recent years, DED has gained attention as a viable 
manufacturing method in the construction industry where metallic materials are used extensively in distinctive and 
complex designs [3]. Often, traditional techniques such as hot rolling, cold forming, and extrusion can only 
produce regularly shaped, prismatic metallic components [4], which limits the potential use of metallic materials in 
construction and design. DED can complement traditional methods and produce components with almost any 
shapes with high precision. 


However, DED is somewhat plagued by geometry problems [5]. For example, the thicknesses of the deposited 
(“printed”) layers often deviate from their design values [6]. When more and more layers are deposited, heat 
accumulation tends to cause the layers to spread, increasing their width but decreasing their height [7]. In addition, 
as the nozzle comes to deposit the object’s comers or intersections, the resulting geometry also quite often deviates 
from the design [8]. The geometrical dimensions of the deposited object often fail to meet the quality requirement 
or even collapse when geometry deviation occurs during the deposition and is not solved in a timely manner, which 
will lead to time and cost waste. Moreover, an accurate geometry profile is quite helpful for other quality analysis 
during the printing process, such as the online stress measurement [9]. Therefore, it is important that the geometry 
of the printed object is continuously inspected in real time, as more material gets deposited. 


Vision cameras and laser line scanners are often employed to assess the geometry of an object being printed via 
DED [10]. However, though vision cameras can assess the geometry in real-time, only either the track width or 
track height is measured, but not the whole track pro-file [11]. On the other hand, laser line scanners are mostly 
used post-DED, at which point the influence of powder and deposition laser is quite small so that the measuring 
accuracy is high. Besides, previous studies on geometry assessment have tended to focus on single- or multi-layer 
straight line deposition; multi-layer deposition of components with sharp features such as intersections or corners, 
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is important but not well studied. Therefore, the objective of this study is to develop a system capable of 
conducting real-time 3D geometry assessment during multi-layer deposition of objects with sharp features. There 
are four main contributions: (1) A geometry assessment system was developed to achieve both real-time inspection 
and layer-wise inspection during the LP-DED process; (2) In real-time inspection, an image processing method 
including a novel encoder-decoder based profile completion network was developed to obtain an accurate track 
profile; and (3) geometry assessment of multi-layer X-shape deposition has been achieved. 


This paper is organized as follows. Section 2 gives some background on our research, which includes the working 
principle of a laser line scanner, a description of LP-DED and geometry assessment during DED. Section 3 
explains the developed geometry assessment methodology. Section 4 provides the experimental results and 
discussion of the proposed methodology. Section 5 concludes the paper by presenting a summary, limitations, and 
future work. 


2. RESEARCH BACKGROUND 
2.1 Description of laser line scanner 


A laser line scanner is a piece of non-destructive testing (NDT) equipment that has been used successfully to 
capture the shape of an object. The scanner operates based on the principle of laser triangulation as shown in 
Figure 2-2. By selecting the bottom of the sensor as a reference as shown on Figure 2-2, the basic triangulation 
calculation is expressed in equation (1). 


ay 


(1) 


b, = 
1 tana, 


For laser line scanner, instead of projecting a single point, a laser line is projected on the target surface. After that, 
the diffusely reflected light of the laser line is detected by a high-quality sensors array called CMOS sensor matrix. 
Each projected point corresponds to one column on the sensor matrix. Based on the position of the detected laser 
beam on the corresponding column on sensor matrix, the distance of one measuring point to a defined reference in 
the sensor (Z coordinate) and can be calculated via triangulation, and the exact position of each point on the laser 
line (X coordinate) is acquired accordingly. 


Light source 


Z-axis CMOS array 


Receiver 


Target 


Y-axis 


(a) (b) 


Fig. 1: (a) Principle of optical triangulation, (b) Principle of laser line scanner 


In addition, a band filter is embedded right before the CMOS sensor to avoid the reflection of light beyond the 
expected wavelength, and only captures the reflection of the projected laser line. With respect to laser light, red and 
blue laser diodes are commonly available for laser line scanners. The red laser scanner is ideal for common 
measurement tasks especially with extremely dark surfaces whereas the blue laser scanner is ideal for transparent, 
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organic, and red-hot glowing surfaces. In the DED process, when the high power-density laser is focused on a 
continuous stream of metal powder, the substrate becomes red- hot glowing surface. Thus, for this study a blue 
laser diode line scanner is used to achieve an accurate result. In addition, the blue laser is preferred to the red one 
since it is an extremely sharp-focused laser line that does not penetrate the surface. 


2.2 Geometry assessment during DED 


The geometry assessment targets include the geometry of the melt pool area and the just solidified area as well as 
the layer geometry of the deposited object. Much research in recent years has focused on inspecting the 
geometry of the melt pool, since it can be used for real-time geometry control [10]. Camera-based methods are 
popularly used to inspect the melt pool geometry, including vision camera and infrared camera. Vision cameras 
can be used to measure either the width or the height of the melt pool depending on the installation location and 
the target measurement field [11], while infrared cameras are normally used to measure the width and length of 
the melt pool [17]. Some recent papers have attempted to estimate the melt pool height using infrared cameras 
with the help of deep learning methods. However, these methods cannot measure the spatial profile of the melt 
pool area. The inspection of just solidified area is important as it is reported that this can be used for online stress 
estimation [9]. Moreover, compared with the melt pool area, the just solidified area might better represent the 
final geometry of the deposited track since thermal shrinking occurs during the solidification process [18]. The 
just solidified area is quite close to the melt pool area so that some studies use it for real-time geometry control 
though there is bound to be a small lag. Similarly, camera-based methods are used but a spatial profile cannot be 
obtained [19]. While laser line scanners can be used to obtain a special profile [20], its performance are affected 
by powder reflection and high intensity melting laser during LP-DED process, which prevents it from obtaining 
accurate spatial profile, thus it is often used for layer-wise geometry inspection rather than real-time geometry 
inspection. The inspection of layer geometry after printing each layer has been studied as well. This method 
needs to consider inspection path during the design process, which increases design effort and printing time. 
However, it is applicable to all kinds of printing shapes and toolpaths. In addition, for layer-wise control which 
does not need instant feedback, it is a better choice [21]. To address geometry assessment of all three targets, a 
geometry assessment system is developed in this study which aims to achieve both real-time inspection and 
layer-wise inspection of the track profile using laser line scanner. 


Note that the term “real-time” might be ambiguous, as it has been used in a different sense in the literature from 
field to field. In this study, inspections with the laser line of a line scanner located at just a solidified area or 
melting pool area are called real-time inspection, where a small delay relative to the deposition is expected. On 
the other hand, inspection that takes place after each layer has been printed is called layer-wise inspection, since 
normally a large delay exists. 


3. METHODOLOGY 
3.1 Overview of the geometry assessment methodology 


For real-time inspection, images are captured during deposition. Here there are three apparent problems that 
would prevent us from obtaining an accurate profile: (1) powder reflection, (2) melt-influenced area and (3) track 
profile missing, as illustrated in Fig. 4. The following steps are proposed to overcome these three problems. 
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Fig. 4 Problems with raw images: (a) typical profile from real-time inspection of melt pool area, (b) typical 
profile from real-time inspection of just solidified area 


3.2 Powder reflection removal 


Firstly, an image enhancement technique, namely contrast stretching, is applied to improve the contrast in each 
image by converting the original intensity value of a smaller range to a larger range of intensity values, as shown 
in Eq. (3) and Fig. 5. 


s=T(r) (3) 


where r is the input intensity, s is the output intensity, and T is the intensity transformation function. By applying 
this function, the brighter pixels become even brighter, and darker pixels become even dimmer. Since powder 
reflection always has lower intensity, it can remove the pixels of powder reflection. 


Output intensities, s 


Input intensities, r 


Fig. 5 Powder reflection removal: contrast stretching 


3.3 Melt-influenced area removal 


To remove the melt-influenced area, a DBSCAN clustering algorithm is applied on the image. As shown in Fig. 
6(a), for all non-zero pixels in the image, a core pixel is selected if the pixel has n number of neighbors, where 
the neighbors are pixels within a distance ¢ from the core pixel. The distance between two pixels is calculated 


968 


using Eq. (4): 


dy = [y -u + -oY @) 


where (u;,v;) and (u;,v,;) are the U-V coordinates of pixels i and j. A cluster is formed by recursively 
taking a core pixel, finding among all of its neighbor pixels that are core pixels, in turn finding all of their 
neighbors that are core pixels, and so on. After clustering, the cluster with the largest mean v value will be 
removed. For some images, the melt-influenced area will be connected to the track profile, and the track profile 
will be removed as shown in Fig. 6 (b). To solve this problem, a modified DBSCAN is proposed by adding an 
additional coordinate m to represent the distance to the lowest non-zero pixels for each column as illustrated in 
Fig. 6 (c) and the distance between two pixels is calculated using Eq. (5). After that, the DBSCAN algorithm is 
applied, and the result can be seen in Fig. 6(d) 


(5) 


= mi) 


(uj - uj) + (vy; - n) +(m 


e ere 


Core pixel 


u 
Explain DBSCAN clustering 


(a) (b) 


Define cdordinate m 


(c) (d) 


Fig. 6 Melt-influenced area removal: (a) explain DBSCAN clustering on the image, (b) directly apply DSCAN, 
(c) add coordinate m and (d) apply modified DBSCAN 


3.4 Encoder-decoder based profile completion 


After melt-influenced area removal, the track profile is extracted. However, there exists severe profile missing 
problems, which will severely deteriorate the quality of 3D point cloud for geometry assessment. The profile 
missing problem is mainly due to bad reflection on the deposited metal surfaces, especially when there is 
high-intensity laser radiation from the printer. As shown in Fig. 7, the cross-sections at different locations on the 
X-shape represents different types of profile missing, and a pattern can be observed for each track: (1) at the 
beginning of each track (location A), the opposite side of the melt-pool influenced area are missing for the 
one-peak cross section; (2) when approaching the intersection (location B), the line scanner captures two-peak 
cross section and the middle part of the two-peak cross section is missing; (3) at the intersection (location C), the 
top part of the one-peak cross section is missing; (4) when getting away from the intersection (location D), two 


969 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


side parts of the two-peak cross section are missing; (5) at the end of each track (location E), same as in location 
A, the opposite side missing of one-peak cross-section can be observed. For such varied types of profile missing 
problems, traditional completion methods such as curve fitting [22] or interpolation methods cannot achieve 
satisfied result, especially for the middle part missing of two-peak cross section. 


Fig. 7 Track profile missing problem 


To complete the track profile in captured images, an encoder-decoder based profile completion algorithm is 
proposed. The idea of adopting an encoder-decoder based network comes from point cloud completion techniques. 
Encoder-decoder based networks are widely used in 3D point cloud completion tasks [23], [24], as the encoder can 
summarize the geometric information from an incomplete input point cloud to form a feature vector, and, based on 
the feature vector, the decoder will predict the complete shape of the point cloud. The proposed algorithm is 
revised from DeepLabv3+ [25], which is a popular encoder-decoder network with images as input. 
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Fig. 8 Proposed encoder-decoder based profile completion network 
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In this study, the useful information in preprocessed images is the track profile pixels, which take up a small 
proportion of the whole image. Therefore, to make an efficient use of information in the images, the images from 
previous steps are further converted to 1D array and fed as input. A novel deeplab1D network is proposed which 
changes 2D operation in all layers of the DeepLabv3+ network to a 1D operation. Moreover, since previous 
profiles might provide useful information for the completion of current profile, therefore the previous (¢ — 1) 
profiles are considered as input for the proposed network. When ¢ equals 1, only the current profile is considered. 


The architecture of the proposed encoder-decoder based network is shown in Fig. 8. First, the image 
of 140 x 220 pixels from previous steps is converted into a 1 x 220 array. For each column of the array of the 
image, the gravity of the intensity plot on the v axis is taken as one of the values in the 1 x 220 array. The 1 x 
220 array is inputted into the proposed network, which includes an encoder and a decoder. The encoder consists 
of a ResNetID and an ASPP1D module. The ResNet!D module is based on ResNet-101 [26]. The ASPP1D 
module has evolved from Atrous Spatial Pyramid Pooling (ASPP) [27], which conducts several parallel Atrous 
convolutions with different rates. An Atrous convolution is described in Eq. (6). 


yli] =) xli +r- k]w[k] 6) 


k 


where I is the location on the output feature map Y, W is a convolution filter applied over the input feature 
map x. The Atrous rate Y determines the stride that samples the input signal. Note that an Atrous convolution 
with r = 1 is equal to a standard convolution. In the encoder module, the concatenated output from ASPP1D 
is processed by a 1 x 1 convolution and an interpolation by a factor of 4, then concatenated with the convolved 
low-level feature from the ResNet1D. After this concatenation, a few convolutions are applied to refine the 
features followed by interpolation by a factor of 4 to recover the final output shape (a 1 x 220 array). 


3.5 Point cloud generation 


The captured data from real-time inspection and layer-wise inspection is in its local coordinate system. For 
subsequent geometry assessment, such as comparing as-designed geometry with as-built geometry, the captured 
data needs to be converted to the global coordinate system. For real-time inspection, the U-V to X-Z coordinates 
conversion and X-Z coordinate transformation are needed to obtain the point cloud of target cross-section from 
captured images. While for layer-wise inspection, 2D points are collected, thus only X-Z coordinate 
transformation is conducted. 


Since the commercial line scanner normally uses a CCD sensor to capture its emitted laser reflection on the target 
surface, this process can be modeled using a pin-hole model, which is the basis camera model based on the 
perspective projection principle [28]. 


First, intrinsic calibration is conducted, which is to reconstruct the X-Z coordinates in the real-world coordinate 
system, given the U-V coordinates in the image coordinate system using the following equation. 


x] kı” kı? key3"] pu 
a = [koi kaz” kzz’ >| (10) 
1 kzı" k32" kg3'} +1 


By getting several pairs of U-V and X-Z coordinates, a least-square solution can be used, and the intrinsic 
transformation matrix K can be obtained. 


Second, after obtaining the X-Z coordinates in the line scanner’s local coordinate system (X,,Z,), a translation 
matrix is needed to convert the X-Z coordinates into the global coordinate system (Xg, Zg), which is the extrinsic 
calibration process. After installing the laser line scanner, a rectangular calibration bar with known dimension is 
put at the origin of the global coordinate system, thus the as-designed cross section profile of the calibration bar 
in the global coordinate system can be obtained. Finally, after obtaining coordinates of measure profile in 
nozzle’s coordinate system, the 3D coordinates of the measured profile in the global coordinate system can be 
calculated by fusing the nozzle’s position using Eq. (16). 
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Xj] 1 0 tx, > 
= n 
iM -|o eaa (16) 


where the tx, ty, and tz, are the coordinates of the printing position in the global coordinate system, i.e., the 
nozzle’s position. To obtain the nozzle’s position for each captured profile, a log file is extracted from the DED 
printer during deposition which contains the location of the nozzle and a timestamp at each location. 


4. EXPERIMENT AND DISCUSSION 
4.1 Experiment setup 


The LP-DED printer used in this study is an InssTek MX-400, which is a commercial metal printer equipped 
with a 5-degree-of-freedom mechanical moving stage, a Ytterbium fiber laser with a wavelength of 1070 nm, a 
maximum power of 1kW, and a focal laser beam diameter of 800 um, as well as a metal powder delivery system 
with shield gas and carrier gas (Argon gas). A 316 L stainless-steel powder with an average particle size of 100 
um was used for deposition and a substrate using the same material as the powder with dimensions 100 mm x 50 
mm x 10 mm was placed on the moving stage. 


Two laser line scanners (Micro-Epsilon scanCONTROL 3000-25/BL), an inclined line scanner and an upright 
line scanner (Fig. 13), are installed for real-time and layer-wise inspection, respectively. Calibrations are 
conducted for both line scanners to get the transformation matrix used for 3D point cloud generation before 
deposition. For the inclined line scanner, intrinsic and extrinsic calibration are conducted; for the upright line 
scanner, only extrinsic calibration is needed. The exposure time was set to 20 ms (determined based on 
experience) to get a better reflection from the metal surfaces. The deposited object is a 20-layer X-shaped object. 
Two experiments are conducted: Experiment 1: deposition with real-time inspection of melt pool area (d = 
Omm) and layer-wise inspection; Experiment 2: deposition with real-time inspection of just solidified area (d = 
3mm) and layer-wise inspection. 


Fig. 13 Experiment setup: Geometry assessment system 
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4.2 Processing result 


During the deposition of each layer, the inclined line scanner with laser line pointing at the melt pool or just 
solidified area captures and transfers images to the software for processing. After the deposition of each layer, 
layer-wise inspection is conducted by the upright line scanner, and point profiles are collected. For layer-wise 
inspection, 2D points of each cross-section are obtained and 3D point cloud data can be generated using extrinsic 
calibration matrix and fusing printer information. For real-time inspection, powder reflection removal and 
melt-influenced area removal are conducted for each image. Then the developed encoder-decoder based profile 
completion network is used to complete the profile on each image and the U-V coordinates are obtained. Finally, a 
3D point cloud data of the deposition X-shape object in the global coordinate system can be generated using the 
proposed method. 


Two experiments are conducted and each with 20 layers of data collected. Experiment 1 collects 1167 images and 
Experiment 2 collects 1157 images in total. Given the traverse speed of the printer (10 mm/s) and the total tool path 
length of each deposition (993.86 mm), the actual profile rate of the real-time inspection is about 12 frames per 
second. In this study, the paired image data from experiment 1 is used for model training and validation, which is 
divided into training and validation data in a ratio of 0.8, 0.2, respectively. The collected data from experiment 2 is 
used for model testing. The training process was conducted on GPU (NVIDIA GeForce GTX1080) using Python 
3.9, Pytorch 11.7 and CUDA 11.8. Since previous profile is considered, t equals 1 to 5 are considered. For each t 
value, 80 epochs are trained, the validation dataset is used to choose the best model, then the best model is used for 
testing. 


Root mean squared error (RMSE) was calculated between the output prediction array and the ground truth array. 
The average RMSE of all profiles in each layer was obtained for each ¢ for comparison (Table 4). For higher layers, 
the RMSE becomes larger. This is due to the fact that when the layer height becomes higher, the sides of the 
deposited object will have a more inclined angle of 90 degrees, resulting in a worse reflection of the laser from the 
line scanner projected on the side of the deposited object. Thus, there will be a more serious profile missing 
problem for the higher layers, making it more difficult to complete the profile. In addition, when the absolute 
deposition height increases, the network prediction error also tends to increase, but the proportion of the error with 
respect to absolute deposition height may remain the same. Therefore, an RMSE-H ratio is calculated for each 
layer, which is the RMSE error divided by the design height (Table 4). As seen, there is not much difference in the 
RMSE-H ratio for each layer. The RMSE-H ratio of the first few layers is larger. Considering that the profiles of 
the first five layers are not seriously missing and the completion model may not be effective in such cases, the 
profile completion can be applied to the higher layers without the first five layers. When more previous layers are 
involved, there is little difference in the performance of the proposed model. A slight decrease of RMSE can be 
observed when ¢ equals 2. Therefore, the model with ¢ = 2 is finally selected as the profile completion model in our 
geometry assessment system, which can generate point cloud with an RMSE of 0.21 mm compared with the 
ground truth. 


Table 4: Test results of the proposed model for different values of t 


Test result As-designed RMSE (mm) RMSE-H ratio 
height (mm) 

1 |f=2 |t=3 |t=4 |t=5 [t=1 [t=2 Jt=3 | t=4 |t=5 
1 0.2 0.03 | 0.03 | 0.03 | 0.04 | 0.03 | 16% | 14% | 16% | 18% | 15% 
2 0.4 0.06 | 0.05 | 0.06 | 0.06 | 0.06 | 14% | 13% | 14% | 14% | 14% 
3 0.6 0.07 | 0.07 | 0.08 | 0.08 | 0.08 | 12% | 11% | 13% | 13% | 14% 

Layer 

4 0.8 0.10 | 0.09 | 0.10 | 0.10 | 0.11 | 12% | 11% | 13% | 12% | 14% 
5 1 0.11 | 0.10 | 0.11 | 0.12 | 0.13 | 11% | 10% | 11% | 12% | 13% 
6 1.2 0.13 | 0.12 | 0.14 | 0.14 | 0.16 | 11% | 10% | 12% | 11% | 13% 
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7 1.4 0.16 | 0.15 | 0.18 | 0.18 | 0.19 | 12% | 11% | 13% | 13% | 13% 
8 1.6 0.16 | 0.17 | 0.18 | 0.19 | 0.20 | 10% | 10% | 11% | 12% | 13% 
9 1.8 0.20 | 0.18 | 0.20 | 0.19 | 0.21 | 11% | 10% | 11% | 11% | 12% 
10 2 0.19 | 0.18 | 0.20 | 0.19 | 0.21 | 9% 9% | 10% | 10% | 11% 
11 2.2 0.22 | 0.20 | 0.22 | 0.22 | 0.23 | 10% | 9% | 10% | 10% | 11% 
12 2.4 0.23 | 0.23 | 0.23 | 0.22 | 0.25 | 10% | 10% | 10% | 9% | 10% 
13 2.6 0.25 | 0.24 | 0.25 | 0.23 | 0.26 | 9% 9% 9% 9% | 10% 
14 2.8 0.28 | 0.29 | 0.28 | 0.30 | 0.31 | 10% | 10% | 10% | 11% | 11% 
15 3 0.27 | 0.27 | 0.28 | 0.30 | 0.33 | 9% 9% 9% | 10% | 11% 
16 3.2 0.31 | 0.29 | 0.30 | 0.31 | 0.33 | 10% | 9% 9% | 10% | 10% 
17 3.4 0.38 | 0.36 | 0.36 | 0.38 | 0.37 | 11% | 10% | 11% | 11% | 11% 
18 3.6 0.40 | 0.38 | 0.39 | 0.39 | 0.39 | 11% | 11% | 11% | 11% | 11% 
19 3.8 0.40 | 0.38 | 0.38 | 0.41 | 0.41 | 11% | 10% | 10% | 11% | 11% 
20 4 0.43 | 0.42 | 0.42 | 0.44 | 0.45 | 11% | 11% | 10% | 11% | 11% 
Overall 0.22 | 0.21 | 0.22 | 0.22 | 0.24 


5. CONCLUSION 


This study has developed a real-time geometry assessment methodology for LP-DED using laser line scanner. A 
geometry assessment system has also been developed to achieve real-time inspection of the melt pool area and the 
just solidified area, as well as layer-wise inspection of the layers’ geometry. An image processing method has been 
proposed including powder reflection removal, melt-influenced area removal and an encoder-decoder based 
profile completion network to obtain track profile on images. Then a point cloud generation method was developed 
including U-V to X-Z coordinates conversion, X-Z coordinate transformation, and 3D point cloud generation by 
fusing printer information. Experiments have been conducted to validate the proposed method and the result shows 
that an average RMSE of 0.21 mm can be achieved from point cloud comparison between the point clouds 
obtained in realtime and obtained layer-wise. The proposed real-time inspection method was able to achieve better 
performance compared with the line scanner’s built-in method, and the developed encoder-decoder profile 
completion model has been validated which outperforms baseline model. The deposition heights of the melt pool 
and solidified layer were compared, and the result showed that the differences were not significant using the 
proposed method. 
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OPTIMAL NUMBER OF CUE OBJECTS FOR PHOTO-BASED INDOOR 
LOCALIZATION 


Youngsun Chung, Daeyoung Gil & Ghang Lee 
Department of Architecture and Architectural Engineering, Yonsei University, South Korea 


ABSTRACT: Building information modeling (BIM) is widely used to generate indoor images for indoor 
localization. However, changes in camera angles and indoor conditions mean that photos are much more 
changeable than BIM images. This makes any attempt at localization based on the similarity between real photos 
and BIM images challenging. To overcome this limitation, we propose a reasoning-based approach for determining 
the location of a photo by detecting the cue objects in the photo and the relationships between them. The aim of 
this preliminary study was to determine the optimal number of cue objects required for an indoor image. If there 
are too few cue objects in an indoor image, it results in an excessive number of location candidates. Conversely, 

if there are too many cue objects, the accuracy of object detection in an image decreases. Theoretically, a larger 
number of cue objects would improve the reasoning process; however, too many cue objects could lead to declining 
object detection performance. The experimental results demonstrated that of two to five cue objects, three cue 
objects is most likely to yield optimal performance. 


KEYWORDS: indoor location determination, BIM, reasoning 


1. INTRODUCTION 


Photos are commonly used as a medium to support building maintenance and defect management (Kim et al., 
2014). These photos are sometimes taken by experienced field workers, but most are captured by unskilled workers 
or individuals with a limited understanding of the building, such as occupants (Kang et al., 2019). In existing 
maintenance systems, users are typically required to manually tag the locations where a photo was taken to utilize 
the system effectively. In particular, when occupants report defects, the specific locations and conditions of the 
defects are often described in unstructured text, making management even more challenging. Various methods 
have been used to accurately determine the locations where photos were taken. Several image-based indoor 
positioning methods have been explored, including approaches which search for the most similar BIM screenshot 
image to a target photo (Ha et al., 2018) or regress the camera position using deep learning algorithms (Acharya 
et al., 2019). Nevertheless, image-based methods typically rely on extensive image training and are sensitive to 
changes in indoor conditions (Kim & Kim, 2023). This sensitivity becomes especially problematic in buildings 
that have multiple varying factors, including interior fittings and lighting. 


To overcome these limitations, especially the sensitivity to changes in indoor conditions, we propose an indoor 
localization method based on reasoning-based localization method that uses cue objects and their spatial 
relationships. Unlike furniture, cue objects, such as doors and windows, can serve as stable reference points 
because they rarely change. To achieve this goal, the first step is to determine the optimal number of cue objects 
required. If there are too few cue objects, they may not provide sufficient information for localization. However, 
if there are too many cue objects, the accumulated accuracy of cue-object detection decreases. The aim of this 
preliminary study was to validate our proposed method by determining the optimal number of cue objects required 
in an image to accurately locate the positions of indoor photos. To select the optimal number, we developed a 
prototype localization method based on the spatial relationships among cue objects, which involved comparing the 
similarities between cue objects and their spatial relationships in a target indoor image with those in a BIM model. 
We evaluated the performance of the proposed method by varying the number of cue objects in an indoor image, 
using the mean probability to accurately determine the location where the image was taken. To validate the 
proposed method, we measured the performance using the mean probability of localization and varied the number 
of objects within the photos. 


This paper consists of five sections. Following this introduction, the second section discusses previous studies 
related to the research. The third section describes the research methodology and explains the details of the 
experiments. The fourth section presents the analysis and results of the experiments, and the final section concludes 
the paper by discussing the main findings, contributions, and limitations of the research. 
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2. BACKGROUND 
2.1 Indoor Localization Using Images 


Recent developments in computer vision have led to many attempts at indoor localization. Ha et al. (2018) 
suggested an indoor localization approach using BIM and a visual geometry group (VGG) model (Simonyan & 
Zisserman, 2015). They used the proposed model to retrieve the most similar BIM screenshot image to a given 
photo and to determine where the photo was taken. Alam et al. (2022) conducted a similar investigation based on 
recurrent neural networks (RNNs) to find the correct position of an indoor camera. These methods were intuitive 
and moderately effective but sensitive to variations in indoor decorations and lighting conditions arising due to 
their reliance on identifying the most similar screenshot images. 


To enhance robustness, developers have attempted to utilize image datasets with camera trajectory data. In the 
context of BIM-PoseNet (Acharya et al., 2019) and related studies, various researchers have trained models based 
on deep convolutional neural networks (DCNNs) and large datasets of indoor BIM screenshot images to determine 
the positions and angles of the cameras used to capture photos. Two such studies were based on RNNs (Acharya 
et al., 2020) and channel-wise transformer localization (CT-Loc; Kim & Kim, 2023). Although BIM-PoseNet and 
CT-Loc applications are robust under varying lighting conditions due to an edge extraction method, they still 
cannot adapt to changes in furniture arrangements or interior decorations. Additionally, the studies were limited to 
the fixed linear paths of the cameras and excluded the simultaneous handling of close-range and wide-angle images. 
However, close-up photos are typically taken to capture the appearance of small-sized defects clearly, but for 
effective building defect management, both wide- and close-range images are required (i.e., photos need to be 
taken from a distance to address defects that cover a wide area or where the spatial context of the defects is crucial). 


2.2 Indoor Localization Using Objects 


To overcome the problem of condition changes in images, several researchers have proposed methods that utilize 
objects within images for indoor localization. Bay et al. (2006) investigated image-based indoor localization using 
speed-up robust features (SURF; Guan et al., 2016) and unique landmarks, such as posters or logos. Similarly, Li 
et al. (2022) used multiple visual landmarks and incorporated smartphone compass readings to improve 
performance. However, using posters as references is not practical because posters may change frequently, causing 
difficulties in keeping indoor landmark databases up to date. 


To overcome these limitations, our author team proposed a method that used semantic segmentation and pose 
estimation for the positions of cue objects in indoor photos. They aimed to identify the indoor location where the 
cue objects in photos and conducted a proof-of-concept study (Kim, 2022). However, the method was only tested 
on objects photographed at relatively short distances. 


In summary, previous localization methods based on images have revealed weaknesses in analyzing images under 
varying conditions. While the methods that employed edge-rendered images and semantic segmentation proved 
helpful in increasing the robustness of localization under different lighting conditions, they were still ineffective 
in capturing changes in interior items or furniture. As a solution, we previously proposed a method that focused 
on cue objects that rarely changed over time, such as light switches and fire extinguishers, and conducted a 
preliminary study (Kim, 2022). To further develop the method, in this study, we conducted a set of experiments to 
determine the optimal number of cue objects required for the method. 


3. RESEARCH METHOD 


The research flowchart for this study is depicted in Fig. 1. The indoor localization method first obtains information 
about cue objects and their spatial relationships using computer vision technology. We trained and validated the 
object detection model on an object detection training dataset using the object types and spatial order of the 
bounding boxes detected to reason indoor locations by comparing them with the spatial relationships among BIM 
cue objects. This method is based on left-right relationships of cue objects. Location reasoning may result in one 
or more specific sets of candidate cue objects. Each candidate represents a potential location depicted in the image. 
We evaluated the performance of the reasoning model based on the probability of accurately determining the 
location where the photo was taken, considering the number of candidates and the object detection accuracy. We 
tested the method with varying numbers of cue objects and found the optimal number when the model achieved 
the highest performance. 
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In the experimental phase, to which this paper relates, we began by determining the types of objects that would be 
used as cue objects and establishing the range of objects present in the image. We then created a sample for the 
BIM model of a housing unit, shown as a BIM DB in Fig. 1, which incorporated 10 different types of cue objects 
across 11 rooms. We generated 12,861 BIM screenshot images and used 12,671 images to train and validate the 
object detection model. We employed the remaining 190 images, each of which included 2-5 cue objects, to 
evaluate indoor localization performance and find the optimal number of cue objects. 


Determine types of cue objects 


Objects that rarely change are selected 
as cue objects. 


Create a sample BIM model 


A BIM model is created with 10 types of 
cue objects in 11 different rooms. 


Detect cue objects from images using 
an object detection model 
11,404 images (training) 

1,267 images (validation) 


Locate images based on the spatial 
relationships of detected objects, 
matching with the BIM model 


190 images with two to five cue objects 


Determine the optimal number 
of cue objects 


Fig. 1: Research flowchart 


3.1 Selection of Cue Objects 
For the experiments, we set the following criteria for cue objects: 


1) The appearances or positions of cue objects should rarely change; thus, transient items, such as tables and 
posters, should not be considered cue objects. 


2) Ideally, objects should be unique to a certain space and representative of the space. However, since a few objects 
remain constant over time and are exclusive to a particular space, non-unique objects, such as light switches or 
doors, could also be considered cue objects. 


To determine positions using the spatial relationships among objects that remained relatively constant, we selected 
objects that fulfilled the above criteria, which resulted in the 10 cue objects listed in Table 1, including three types 
of doors, a window, a power socket, a light switch, a sink, a toilet bowl, a showerhead, and a kitchen cabinet, being 
chosen for the experiments. 


3.2 BIM Model Creation 


We created a sample BIM, we created a model of a housing unit for the experiment. Fig. 2 presents the axonometric 
view and the plan of the unit model. The model consisted of 11 rooms (the living-dining-kitchen [LDK] space, 
bedroom 1, bedroom 2, bedroom 3, bedroom 4, bathroom 1, bathroom 2, pantry, closet, balcony, and entrance), all 
of which had boundary walls, except for the LDK space and the entrance. Although some rooms were significantly 
different from one another (e.g., bedroom 1 and 2), there were also similarities between certain rooms (e.g., 
bedrooms 2 and 3). Table 1 provides detailed room information for the housing unit, including the room number, 
name, and list of cue objects present inside each room, along with their respective quantities. We placed the cue 
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objects in plausible locations and varied their numbers and placements. We classified cue objects with differing 
appearances within the same category as distinct types. For example, we categorized doors into three different 
types. Doors A and B were both indoor wooden doors, but door B differed from door A by having two panels. 
Meanwhile, door C was a steel front door. 


Fig. 2: Axonometric view of the housing unit model (left) and plan of the unit (right) 


Table 1: Room information (number, name, and cue objects) for the BIM model of housing unit 


Room Number Room Name Object Name and Quantity 
1 LDK Door A, 7; Window, 2; Power socket, 8; Light switch, 5; Kitchen cabinet, 1 
2 Bathroom 1 Door A, 1; Power socket, 1; Sink, 1; Toilet, 1; Showerhead, 1 
3 Bedroom4 Door A, 1; Window, 1; Power socket, 1; Light switch, 1 
4 Pantry Door A, 1; Power socket, 1 
5 Closet Door B, 1; Power socket, 1; Light switch, 1 
6 Bedroom 1 Door A, 2; Door B, 1; Window, 1; Power socket, 4; Light switch, 2 
7 Bathroom 2 Door A, 1; Power socket, 1; Sink, 1; Toilet, 1; Showerhead, 1 
8 Balcony Door A, 1; Window, 2; Power socket, 1 
9 Bedroom 3 Door A, 1; Window, 1; Power socket, 1; Light switch, 1 
10 Bedroom 2 Door A, 1; Window, 1; Power socket, 1; Switch, 1 
11 Entrance Door C, 1 


3.3 Image Dataset Preparation 


To train the model and evaluate the performance of the indoor localization method, we generated 12,861 BIM 
screenshot images, of which 12,671 were used for object detection and 190 for indoor localization method 
evaluation. Specifically, we used 11,404 images (roughly 90% of the object detection dataset) for training and 
1,267 images for model validation. The dataset with 190 images for the indoor localization method was labeled 
differently from the previous dataset. The dataset for object detection was labeled according to bounded boxes, 
whereas the dataset for localization was annotated according to the information for the target room. Table 2 shows 
the major characteristics of the two image datasets. We created the dataset for object detection using a script that 
automatically captured the appearance of objects within the BIM model and labeled them accordingly. However, 
we manually created the dataset to validate the indoor localization method by directly capturing BIM model views. 
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Table 2: Characteristics of each image dataset 


Dataset for object detection Dataset for indoor localization method evaluation 
Purpose To train and validate the object detection model To validate the indoor localization method 
Labeling Cue objects with bounded boxes Rooms and associated cue objects 
Data size 11,404 (training)/1,267 (validation) 190 (validation) 
Creation method Automatically captured BIM model views Manually captured BIM model views 
Image size 785 x 785 1024 x 767 


The images used for training and validating the object detection model were square, measured 785 pixels on each 
side, and were rendered in a realistic style. The dataset creation method is depicted in Fig. 3. We employed visual 
scripting to automatically generate these images. The viewing point from the camera’s location and the target point 
where it is directed are both required to capture BIM screenshots. We began the viewing point and target point 
acquisition process by extracting room boundaries from the model and calculating the midpoint of each boundary 
side. We established viewing points by vertically elevating the midpoint of each boundary 150 cm from the floor 
to position the camera at average eye level. Viewing points were positioned along room boundaries rather than at 
the room centroids to capture images from the maximum distance within a room and thereby capture a greater 
number of cue objects. The scale of the captured objects was similar to that of the objects captured in the indoor 
localization method evaluation dataset when the viewing points were positioned along room boundaries. We then 
set target points by vertically elevating the midpoint of each boundary, spanning 40-200 cm, at intervals of 20 cm 
from the floor. To capture the desired views, we positioned a camera with a field of view (FOV) of 50°, which is 
the base angle of a normal lens, at the viewing point and directed it toward the target points. We repeated this 
process for each room in the BIM housing unit model. We produced the initial BIM images using Dynamo. Each 
was 1,047 x 785 pixels and was subsequently cropped into left, center, and right portions to create 785 x 785-pixel 
images. We removed redundant images that did not contain any cue objects. In total, we generated 12,671 images 
and used them to train and validate the object detection model. 


Viewpoints; 150cm moved Captured images (1047x785) 
vertically from the midpoints 


Midpoints of room boundaries 


Target Points; 40cm~200cm moved 
vertically from the midpoints 


Fig. 3: Dataset creation for the object detection model 


Additionally, we manually generated 190 images with varying numbers of cue objects, ranging from two to five, 
to determine the optimal number of cue objects for an image. The images were divided based on the number of 
cue objects (n) present in each image. We determined the range of n based on the following rationale: To deduce 
the location based on the spatial relationship between objects, at least two cue objects were minimally required. 
For the maximum number of cue objects, assuming that all cue objects were accurately detected, having more cue 
objects in the image made it easier to accurately determine the location. However, it was very unlikely for an image 
to include more than five cue objects. Moreover, as the number of cue objects necessary for the proposed method 
increased, the cumulative object detection error rate increased accordingly. Therefore, we set the maximum n- 
value to 5. Table 3 provides the distribution of images across the rooms. For scenarios with 2, 3, or 4 cue objects, 
50 images were captured for each scenario. For scenarios with 5 cue objects, 40 images were captured, due to the 
limited existence of views that met the criteria. We determined the number of images for each room based on the 
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availability of the desired view and the size of the room. Initially, we assessed whether each room could provide a 
view with a certain number of cue objects within the FOV of 50°. Then, considering the available rooms, we 
allocated the number of images for each room proportionally based on their respective areas. As shown in Table 3, 
the number of available rooms decreased significantly as the number of cue objects in the image increased. To 
fulfill the research objective, the image needed to include all contiguous cue objects. Further details regarding this 
condition will be discussed in Section 3.4. 


Table 3: Number of images taken for each room 


Number 1 2 3 4 5 6 7 8 9 10 11 Total 
Room Area (m°) 60 5 14 5 5 27 5 6 13 12 8 160 
information 
Cue object 
3 5 5 2 3 10 5 5 5 5 1 69 
quantity 
n=2 20 2 5 0 2 9 2 2 4 4 0 50 
Number of n=3 20 2 5 0 2 9 2 2 4 4 0 50 
mapes n=4 29 2 0 0 0 13 3 3 0 0 0 50 
n=5 34 0 0 0 0 6 0 0 0 0 0 40 


3.4 Object Detection 


We used You Only Look Once (YOLO) (Redmon et al., 2016) for this study, which is one of the most widely used 
networks for object detection. We trained the model using 11,404 images and set aside 1,267 images for model 
validation. To rationalize the indoor location and the spatial relationships among cue objects within an image, we 
applied the trained object detection model to the indoor localization method evaluation image dataset, which varied 
the number of cue objects contained in each image. 


3.5 Indoor Location Reasoning 


The goal of indoor location reasoning is to determine the locations at which the positional relationships between 
cue objects obtained through object detection in the images align with the positional relationships the cue objects 
have within the BIM model. This involves analyzing the X and Y coordinates of the bounding boxes of cue objects 
in the images to infer whether one object is to the left or right of another object, or above or below it. However, in 
this experiment, we specifically focused on the left-right relationships between cue objects, as they tended to have 
fewer variations and provided greater accuracy. Based on the object detection results, we created a cue object list 
by arranging the objects in ascending order according to their X-coordinate values. 


To identify the locations where the BIM information matched the information from the image, we considered how 
the model’s information would manifest in the image. To determine which objects could be observed to the left or 
right of a specific object when taking a photo, we employed clockwise ordering of the objects present in each room. 
First, we extracted the positions of the cue objects and the room boundaries, which enabled us to determine the 
locations and relationships of cue objects within each room. Based on the extracted information, we sequentially 
listed all the objects in a clockwise direction along the boundaries of each room. Subsequently, to ensure that the 
list represented the relationships between objects, regardless of the starting point, we copied the elements from the 
front of the list, counting one less than the number of objects found in the image, and added them to the end of the 
list. If the quantity of elements added to the list exceeded the count of objects identified in the picture minus one, 
it could potentially result in duplicates during localization. Conversely, if the number of added elements was less 
than the count of objects identified in the picture minus one, it could lead to potential omissions during localization. 
Fig. 4 depicts an example. If the cue objects in a room were arranged clockwise as a, b, c, d, and e, the original list 
was [a, b, c, d, e]. If the image contained three cue objects, additional elements of the list ‘a, b’, which was one 
less than the total number of cue objects in the image, were appended at the end of the list, resulting in [a, b, c, d, 
e, a, b]. Matching parts were then sought between the cue object list created for each room and the list of cue 
objects present in the image. The matched cue objects were considered candidates for the location from which the 
photo was taken. For instance, if the cue object list for room A is [a, b, c, d, e, a, b], and the cue object list for the 
image is [a, b, c], there is one matching object arrangement. Therefore, [a, b, c] inside room A ([a, b, c, d, e, a, b]) 
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becomes a candidate location. There could be multiple candidates for each image, or no candidates if the object 
detection result was incorrect. 


Spatial information from BIM model Object detection result 


Room object lists of all rooms: 


[[c, b, a, ¢, d), [d, b, h, b, k), [d, b, h, b, k), 
Tae] 


=... 
Additional elements: 


[c.b, a, e, d], [d,b, h. b. k}, [d b, h. b.k}, | 


Number of cue objects: 
3 


Door A (d) 


Showerhead (c) 


t 


(ab, c, d, el] 


The quantity of additional elements: 
3-1=2 


Changed Room object lists of all rooms: 


Room object list: [a, b, c, d, e] ilc, b, a, e, d, c.b) 


[d, b, h, b, k, d, b], Cue object list from the image: 
i b, h, b, k, d, b], [a, b, c] 
a Location Candidate 


Matching location: 
[[c, b, a, e, d, c, b], 
[d, b, h, b, k, d, b], 
[d, b, h, b, k, d, b], 


{a,b ¢, d, e, a, b)) 


Fig. 4: The process of object location matching 


4. RESULTS 
4.1 Object Detection 


Table 4 presents the model’s performance metrics. The object detection model used in the study achieved mAPo 5 
of 0.9403 and F1 score of 0.9595 (Table 4). Accurate object detection within images certainly resulted in improved 
performance in subsequent indoor localization tasks because our proposed method relies on the results of object 
detection; however, the performance tended to degrade exponentially as the number of objects within the images 
increased as long as the object detection performance reached 100%. The results in Table 4 show that the overall 
performance decreased, theoretically, by about 6% on average with the addition of each cue object, considering 
the mAPo.s performance. The proposed localization method also relies on location reasoning, which is positively 
influenced by increases in cue objects. Thus, the number of cue objects, whether too large or too small, can be 
detrimental to performance, emphasizing the importance of optimizing the number of cue objects. In addition, Fig. 
5 illustrates that the method performed better for objects with less skewness and larger sizes. Due to the presence 
of only one door B in the BIM model with significant skewness in the image, the accuracy was low for this object. 
The accuracy for light switches (the smallest of the objects) was slightly lower than for the other objects. Based 
on the results, it appears that choosing larger cue objects for the localization method would probably have resulted 
in improved performance. 


Table 4: Performance of the object detection model based on the validation dataset 


Precision Recall F1 score mAP_0.5 


0.9914 0.9295 0.9595 0.9403 
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Fig. 5: Confusion matrix for the object detection results 


4.2 Optimal Number of Cue Objects 


Table 5 shows the evaluation results for the indoor localization method based on object detection and location 
reasoning for each number of cue objects in the images. Since the first step of the proposed method is to correctly 
detect every cue object in an image, only cases of every cue object in an image being predicted correctly are 
considered correct cases. As the number of objects in the image increased, object detection accuracy tended to 
decrease. This decrease in accuracy was minimal when the number of cue objects (n) changed from 2 (0.88) to 3 
(0.80). The magnitude of the decrease was exponential, resulting in an accuracy of 0.20 when n = 5. The magnitude 
of the decrease was more significant than the theoretically estimated 6% decrease for the addition of an object, 
which was based on the object detection model performance for an individual object. 


To identify the correct location among the candidate locations, the object detection results must be accurate. If 
object detection yields incorrect results, there may be no correct candidates or no candidates present. However, if 
the object detection results are correct, then at least one of the generated candidates is guaranteed to be correct. 
Therefore, after performing object detection, we conducted location reasoning for cases where there was a correct 
candidate and counted the number of generated candidates. Assuming that there was a correct candidate among 
the generated candidates, we calculated the probability of finding the correct location. However, in cases where 
the object detection results are wrong, there may be no correct candidate. Therefore, we multiplied the probability 
of finding the correct location when the correct location was present among the candidates by using object 
detection accuracy to calculate the probability of locating the correct position using the given method. 


We used three metrics to evaluate the performance of the indoor localization method: (A) the mean number of 
location candidates when the correct location was present among the candidates, (B) the mean probability of 
finding the correct location when the correct location was present among the candidates, and (C) the mean 
probability of finding the correct location for both cases when the correct location was present among the 
candidates and when it was not. The three metrics provided answers to three research questions: (A) How many 
location candidates will be generated, depending on the number of cue objects? (B) What is the probability of 
finding the correct location from among the location candidates if the object detection is conducted correctly? (C) 
What is the probability of finding the correct location with the proposed method, considering both object detection 
accuracy and the performance of location reasoning. Metric (C) was the primary metric for assessing the model’s 
performance. To compare the performance of the model with and without the influence of object detection, we 
considered the results based on predicted and actual object information. (A) decreased as the number of cue objects 
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in the image decreased and reached 1.000 when n = 5, and (B) could be calculated as the mean value of the inverse 
of (A), with a higher value indicating better performance. Since (C) considered both the object detection accuracy 
and the localization reasoning performance, it could be calculated by multiplying the object detection accuracy 
with (B); hence, a higher value of (C) indicated superior overall model performance. 


When cases with correct object detection were considered, the localization performance increased as n increased. 
However, localization performance did not increase proportionally to n because the accuracy of object detection 
significantly decreased as n increased. Therefore, when using the predicted object information, the highest 
performance was observed at n = 3, with a probability of 0.283 for finding the correct location. At n = 4, the 
probability of finding the correct location was 0.276, which represented a slight decrease but showed a similar 
performance to that at n = 3. Therefore, even a slight improvement in object detection accuracy for n = 4 had the 
potential to yield a better score than the case at n = 3. This result shows that the presence of three to four cue 
objects in the image yielded optimal results within the given framework. 


Table 5: Model evaluation results 


Number of cue Object detection Localization performance based on Localization performance based on actual 
objects in the accuracy predicted object information object information 
image (n) 
(A) (B) (©) (A) (B) © 
2 0.88 3.841 0.260 0.229 3.800 0.263 0.263 
3 0.80 2.825 0.354 0.283 2.816 0.355 0.355 
4 0.52 1.885 0.531 0.276 1.700 0.588 0.588 
5 0.20 1.000 1.000 0.200 1.000 1.000 1.000 


(A) The mean number of location candidates when the correct location was present among the candidates 


(B) The mean probability of finding the correct location when the correct location was included among the 
candidates. 


(C) The mean probability of finding the correct location for both cases when the correct location was present 
within the candidates and when it was not. 


5. CONCLUSION 


Many previous studies on indoor localization have been based on the similarities between photos and BIM images. 
However, these approaches may exhibit weaknesses under varied lighting conditions and with different wallpapers 
and furniture locations. To overcome these limitations, we propose a reasoning-based approach based on cue 
objects in photos. With this approach, it is essential to optimize the number of cue objects for detection since it 
significantly influences performance. Hence, the aim of this preliminary study was to find the optimal number of 
cue objects in a photo that yielded the best localization performance. The proposed localization method uses spatial 
information on cue objects detected by a computer vision algorithm to locate shots by analyzing the spatial 
relationships among the objects found in an image and comparing them with those in the BIM model. We evaluated 
the method’s performance by assessing the probability of accurately determining the location from which a photo 
was taken, varying the number of cue objects in each photo from two to five. 


The experimental results indicated that the model showed the best performance when three cue objects were 
present in an image. When the number of cue objects increased to four, the probability of accurately determining 
the exact location decreased slightly compared to the case with three cue objects, mainly due to the dramatic 
decrease in object detection accuracy. As the number of cue objects captured from the image increased, the number 
of small and skewed objects also tended to increase, which led to a decrease in overall accuracy. Having a higher 
number of cue objects can make it easier to deduce the location of a shot accurately, but it may decrease object 
detection accuracy. 


The major contribution of this study lies in suggesting the optimal number of cue objects that should be present in 
images to determine the locations of photos shot in indoor spaces. This finding highlights the importance of 
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balancing detection performance and reasoning capability in object detection-based indoor localization and 
considering the number of detected objects. The experimental results provide insights into areas for improvement 
in future research. First, the method did not perform well when cue-object arrangements in rooms were similar. 
This limitation could be addressed by considering the size of cue objects and incorporating more diverse spatial 
information. Second, the error rate accumulated at each stage: the cue-object detection stage, the spatial 
relationship detection stage, and the location deduction stage. Further research is expected to improve the proposed 
method to a practically applicable level. The results of this research will be integrated into construction 
management and maintenance software, enabling the automatic tagging of locations in provided images. 
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ABSTRACT: The construction industry is currently witnessing a transformative period characterized by the 
convergence of the green and digital transitions. The green transition seeks to address environmental challenges 
such as climate change and resource depletion, while the digital transition leverages advanced technologies to 
enhance construction processes. This paper specifically explores the integration of green roofs, as component of 
sustainable buildings, into the Building Information Modeling (BIM) framework, a key enabler of the digital 
transition. Green roofs, known for their environmental benefits, consist of layers that contribute to energy 
efficiency, stormwater management, and biodiversity enhancement. To optimize their design and performance, this 
research employs Dynamo Visual Programming Language (VPL) within Autodesk Revit to create parametric 
models of green roofs. These models facilitate the evaluation of thermal and structural characteristics under 
varying water content conditions (dry and saturated). Results reveal that the choice of substrate and drainage 
materials significantly impacts thermal resistance, particularly in dry conditions. However, in saturated conditions, 
the influence on thermal performance converges, emphasizing the importance of structural considerations in both 
scenarios. The research also highlights various limitations and outlines avenues for future studies, including 
expanding the range of materials, exploring additional performance metrics, and incorporating AI and machine 
learning techniques. By addressing these aspects, this research contributes to a comprehensive understanding of 
the integration of green roofs and BIM. It provides designers and researchers with a practical tool for optimizing 
green roof designs, aligning with contemporary sustainable construction practices, and promoting the holistic 
development of green buildings. 


KEYWORDS: Sustainability integration; Parametric modeling; Digital Transformation 


1. INTRODUCTION 


The construction sector is currently undergoing significant transformations driven by the green and digital 
transitions. The green transition refers to the shift towards sustainable practices and environmentally friendly 
solutions within the industry (Mina et al., 2021). This transition is motivated by the urgent need to address climate 
change, resource depletion, and environmental degradation (Bherwani et al., 2022). As a result, the construction 
sector is increasingly adopting strategies and technologies that reduce the environmental impact of buildings and 
infrastructure. Simultaneously, the digital transition has brought about a profound change in the construction 
industry, fueled by the rapid advancements in digital technologies (Huang et al., 2021). This transition involves 
the integration of digital tools, processes, and data management systems to improve efficiency, productivity, and 
collaboration across all stages of the building life cycle (Giovanardi et al., 2023). Building Information Modeling 
(BIM) has emerged as a key component of the digital transition, revolutionizing the way information is shared, 
analyzed, and utilized within the construction industry. 


Among the various components employed in green buildings, green roofs have gained recognition as an effective 
technological solution for improving sustainability and the life cycle performance of buildings (S. Cascone, 2022). 
A green roof, also known as a living roof or vegetated roof, refers to a roofing system that incorporates vegetation, 
growing medium, and waterproofing layers (Vijayaraghavan, 2016). It offers numerous environmental, social, and 
economic benefits, making it an integral part of green building practices (Shafique et al., 2020). Each layer of the 
green roof system works in tandem to provide a range of benefits. The waterproofing layer ensures the building's 
protection, while the root barrier prevents potential damage. The drainage layer manages stormwater, preventing 
flooding and alleviating pressure on drainage infrastructure. The growing medium layer supports plant growth by 
providing adequate nutrients and moisture retention. Finally, the vegetation layer enhances biodiversity, improves 
air quality, reduces energy consumption, and mitigates the environmental impact of the building. 


The digital transition has ushered in a new era of possibilities and advancements in the construction industry, 
necessitating the adoption of digital tools and processes to optimize project outcomes (Mehrbod et al., 2019). At 
the forefront of this transition is Building Information Modeling (BIM), a powerful technology that revolutionizes 
the way information is managed, shared, and utilized throughout the building life cycle (Wang et al., 2019). BIM 
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is a collaborative process that involves creating and managing digital representations of physical and functional 
characteristics of a building. It enables stakeholders, including architects, engineers, contractors, and facility 
managers, to work together in a coordinated manner, streamlining communication and enhancing decision-making. 
BIM serves as a digital repository of information, encompassing 3D models, 2D drawings, specifications, 
schedules, and other pertinent data related to the building (Gimenez et al., 2015). 


One of the key advantages of BIM lies in its ability to support the supply, integration, and management of 
information throughout the entire life cycle of a building (Alireza et al., 2017). During the design phase, BIM 
facilitates the creation of detailed 3D models that allow for visualization, clash detection, and simulation of various 
design scenarios. Therefore, to fully capitalize on the advantages of green roofs, the integration of these systems 
with BIM is of utmost importance. Previous research (Korol et al., 2019) discussed the integration between green 
roof and BIM technologies used in the engineering design of such systems. This previous research explained that 
to create a BIM object of an extensive green roof system, complex programs such as AECOsim, ARCHICAD, IFC, 
Revit, and Vectorworks are needed. The authors also mentioned that the NBS National BIM Library in the UK sets 
an industry standard for quality, efficient generic and manufacturers’ objects, including green roof systems. 
However, they did not provide detailed information on the integration process between green roof and BIM. Other 
authors (Yu et al., 2017) discussed the application of BIM in the case study of green roof innovation. Specifically, 
the authors incorporated BIM and energy consumption analysis software to demonstrate the benefits of the 
proposed eco-innovative green roof alternative. Finally, in a further study (Kasmion et al., 2000) reported on a 
simulation study using Autodesk Revit BIM software to investigate how types of roof design and green roof 
application may reflect on container’s heat absorption. This study found that a curved roof surface with green roof 
produces a better heat absorption quality. 


While there is a growing recognition of the benefits of green roofs and the potential of BIM in the construction 
industry, there is a notable gap in research that specifically explores the integration of these two domains. The gap 
in scientific knowledge lies in the lack of established frameworks, guidelines, and best practices for effectively 
integrating green roofs within the BIM environment. There is limited research that examines the specific 
workflows, data management strategies, and computational automation techniques required to successfully 
incorporate green roofs into the BIM framework. Furthermore, the understanding of how green roof components, 
materials, and performance characteristics can be accurately represented and analyzed within the BIM environment 
is also lacking. 


In this paper, the focus lies in studying the integration methods between green buildings and BIM, specifically 
emphasizing the incorporation of green roofs. Green roofs, also comprising innovative components and products, 
such as recycled polyethylene, offer unique opportunities to enhance the sustainability of buildings. To achieve 
this integration, the Dynamo Visual Programming Language (VPL) workflow within Autodesk Revit, the most 
widely used BIM authoring software, was employed. Dynamo enabled computational automation, allowing for 
the development of parametric and informative models of green roofs. These models provided computational 
automation for determining the thermal and structural characteristics of the different green roof technologies in 
different water content conditions (dry and saturated) that can be used for controlling and coordinating the entire 
life cycle of a green roof, especially during the initial design stage. 


This research aims to contribute to a deeper understanding of how the combined power of green building practices 
and BIM technology can foster sustainable development in the construction sector. 


2. MATERIALS AND METHODS 


The research is focused on investigating the integration methods between green buildings and Building 
Information Modeling (BIM), with a specific emphasis on the incorporation of green roofs, and it involves the 
computational modelling using the Dynamo Visual Programming Language (VPL) workflow within Autodesk 
Revit. The parametric models enable the manipulation of design parameters, such as green roof technologies and 
water content conditions (dry and saturated), to assess their impact on thermal and structural characteristics. 


2.1. Green roof technologies 


The material characteristics of the drainage layer and substrate in green roofs were evaluated through previous 
experimental studies. Three types of drainage layers and substrates were considered in this study. In terms of the 
drainage layers, commercially available granular products such as perlite and expanded clay were examined. 
Additionally, previous research proposed recycled polyethylene as a potential drainage layer for green roofs, 


989 


aiming to enhance sustainability while reducing environmental and economic impacts associated with production 
and transportation. 


Regarding the substrates, three different compositions were investigated. Substrate S1 consisted of lapilli, pumice, 
zeolites, peat, and slow-release fertilizers. Substrate S2 comprised a mixture of mineral volcanic materials 
combined with organic substances, while Substrate S3 was formulated with a higher percentage of organic matter 
compared to the other substrates to increase water retention. It was also composed of locally available materials. 


Following laboratory tests conducted in previous research, the thermal and physical characteristics of the materials 
used for both the drainage layer and substrate were considered under dry and saturated conditions. In a green roof 
system, saturated conditions are reached after a rain or irrigation event, while dry conditions are obtained when 
the water has completely evaporated during prolonged droughts. As evidenced by Table | and Table 2, altering the 
water content in the materials resulted in changes to their thermal and physical properties. 


These evaluations provide valuable insights into the performance of drainage layer and substrate materials, offering 
a comprehensive understanding of their behavior under different water content conditions. This knowledge is 
essential for optimizing the design and performance of green roofs, enabling informed decision-making and 
promoting sustainable practices within the construction industry. 


Table 1: Thermal and physical properties for drainage layer materials (Cascone & Gagliano, 2022). 
Dry conditions Saturated conditions 


Thermal conductivity Density Thermal conductivity Density 


[W/mK] [kg/m3] [W/mK] [kg/m3] 
Perlite 0.076 164.2 0.312 510.5 
Expanded clay 0.124 410.4 0.234 579.3 
Recycled polyethylene 0.098 329.4 0.144 411.7 


Table 2: Thermal and physical properties for substrate materials (S. Cascone & Gagliano, 2023). 


Dry conditions Saturated conditions 


Thermal conductivity Density Thermal conductivity Density 


[W/mK] [kg/m3] [W/mK] [kg/m3] 
Substrate S1 0.113 1000.2 0.463 1355.5 
Substrate S2 0.134 919.4 0.458 1358.4 
Substrate S3 0.084 605.4 0.418 1183.9 


2.2. Computational modelling with Dynamo Visual Programming Language (VPL) 


To achieve the integration of green buildings and Building Information Modeling (BIM), the research employs the 
Dynamo Visual Programming Language (VPL) as a key tool within Autodesk Revit. Dynamo VPL enables 
computational automation and facilitates the development of parametric and informative models specifically 
tailored to green roofs. 


The initial step involved modeling the materials utilized for green roof layers within the Dynamo environment. 
This was accomplished by duplicating existing materials from the Revit Material's Asset and renaming them using 
custom nodes based on Python scripts previously developed within Dynamo. This approach was necessary as 
standard nodes do not directly manipulate the Revit Material's Asset. 


The "Thermal conductivity" and "Density" nodes provide the property values under both dry and saturated 
conditions, as indicated in Table 1 and Table 2 (Fig. 1 depicts the Dynamo workflow for creating the perlite 
drainage material). The "If" node automatically switches the thermal conductivity and density values between dry 
and saturated conditions. In this research, "true" represents dry conditions and "false" represents saturated 
conditions. For example, in the case of perlite, when the water condition is set to dry, the thermal conductivity is 
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0.076 W/mK and the density is 164.2 kg/m?. Conversely, when the water condition is set to saturated, the thermal 
conductivity becomes 0.312 W/mK and the density changes to 510.5 kg/m’. 


This aspect of the workflow is particularly noteworthy as it enables the computational automation of thermal and 
physical properties for green roofs during the design stage, depending on the water content. With just a single click, 
users can assess the thermal and physical performance of green roofs under dry or saturated conditions, as 
described later in the research. 


The "Material.SetThermal" node is responsible for creating new materials, such as "Perlite" in Fig. 1, with 
properties defined in "Thermal.SetProperties" that vary based on the input from the "True/False" node positioned 
at the beginning of the workflow. 


The workflow illustrated in Fig. 1 for creating perlite serves as the basis for generating the other drainage and 
substrate materials. Fig. 2 showcases the workflow with all the materials created. All the "Thermal conductivity" 
and "Density" nodes are connected to the same "True/False" (dry/saturated) node, ensuring the water content 
condition is automatically considered. 


2.3. Thermal and structural characteristics 


The first step involves assigning the green roof materials to the "Green roof" type, which consists of two layers: 
one for the substrate and another for the drainage layer. The impact of waterproof and anti-root membranes, as 
well as the filter layer, on the thermal and structural characteristics of the green roof is negligible and, therefore, 
not considered. 


Fig. 1: Material creation in Revit by using Dynamo workflow. 


To facilitate the evaluation of the thermal and structural characteristics of the green roof, custom nodes were 
developed (Fig. 3). The "Index" parameter indicates the layer position, with "0" representing the substrate (upper 
layer) and "1" representing the drainage layer (lower layer). Since the thickness of the materials plays a crucial 
role in determining the performance of the green roof, the workflow incorporates a component that automatically 
adjusts the material thicknesses. 


Given the focus on extensive green roofs in this research, the substrate thickness varies between 10 cm and 20 cm, 
while the drainage layer thickness ranges from 4 cm to 6 cm. As an average value, a substrate thickness of 15 cm 
and a drainage layer thickness of 5 cm were adopted. 
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Fig. 3: Layer creation into the green roof type and parametric thickness modelling. 


By selecting the appropriate materials within the "Substrate" and "Drainage layer" nodes, the new materials are 
automatically assigned to the Revit model, ensuring seamless integration and representation of the green roof 
components. 
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Finally, Fig. 4 illustrates the workflow employed to evaluate the thermal and structural characteristics of green 
roofs. In terms of thermal performance, the thermal resistance was a key factor considered. This property was 
automatically measured by Revit and imported into Dynamo, considering the thermal conductivities of the various 
substrate and drainage materials, which are influenced by the water content conditions (dry or saturated), as well 
as the material thicknesses. 
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Fig. 4: Thermal and structural characteristics of green roof evaluation. 


Regarding the structural performance, the weight of the different green roof configurations was determined. The 
total weight depends on the density of the substrate and drainage layers, which are also influenced by the water 
content conditions and the material thicknesses. By multiplying the density of the materials by their respective 
thicknesses, the total weight of each green roof configuration was calculated. 


According to the European standard, if a roof is intended to be walkable during the design stage, it should be able 
to withstand a maximum load of 200 kg/m?. The total weight of the different green roof solutions was compared 
to this limit to assess their compatibility with existing buildings, aiming to avoid costly structural modifications. 


By considering both the thermal and structural characteristics aspects, this workflow provides valuable insights 
into the suitability of different green roof options, facilitating informed decision-making during the design stage. 
It enables designers and researchers to assess the thermal efficiency and structural integrity of green roofs, ensuring 
their compatibility with existing building structures and meeting the required standards. 


3. RESULTS AND DISCUSSION 


Table 3 and Table 4 present the results for green roofs in dry and saturated conditions, respectively. In terms of 
thermal resistance in dry conditions, the highest performance was observed when combining Substrate S3 with 
perlite (2.44 m2K/W), while the lowest performance was measured when Substrate S2 was coupled with expanded 
clay (1.52 m2K/W). These results highlight the significant impact of substrate and drainage material combinations 
on the thermal performance of green roofs in dry conditions. Therefore, in dry conditions, the materials tested have 
similar thermal performance of 3-cm insulation materials. 
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Table 3: Thermal and structural characteristics in dry condition. 
Perlite Expanded clay Recycled polyethylene 


Thermal Resistance Weight Thermal Resistance Weight Thermal Resistance Weight 


[m?K/W] [kg/m] [m?K/W] [kgm]  [m?K/W] [kg/m?] 
Substrate S1 1.99 158.2 1.73 170.5 1.84 166.5 
Substrate S2 1.78 146.1 1.52 158.4 1.63 154.4 
Substrate S3 2.44 99.0 2.19 111.3 2.30 107.3 


Table 4: Thermal and structural characteristics in saturated condition. 


Perlite Expanded clay Recycled polyethylene 


Thermal Resistance Weight Thermal Resistance Weight Thermal Resistance Weight 


[m2K/W] [kg/m2] [m2K/W] [kg/m2] [m2K/W] [kg/m2] 
Substrate S1 0.48 228.8 0.54 232.3 0.67 223.9 
Substrate S2 0.49 229.3 0.54 232.7 0.67 224.3 
Substrate S3 0.52 203.1 0.57 206.6 0.71 198.2 


In saturated conditions, the thermal resistance decreased due to the higher thermal conductivity of water. As a 
result, the thermal performance of different green roofs became similar, with an average value of 0.55 m2K/W. 
These findings indicate that under saturated conditions, the thermal performance of green roofs is less influenced 
by the specific substrate and drainage material combinations and tends to converge to a similar performance level 
across the tested variants. This resistance value is close to the one measured for natural materials, such as wood, 
straw, etc. Designers can consider these average values during the design stage to estimate the energy performance 
of green roofs in terms of energy consumption. 


Regarding the structural performance, all green roof configurations exhibited weights lower than the imposed limit 
overload of 200 kg/m2 in dry conditions, with the lighter solution being the Substrate S3, due to its composition, 
when coupled with perlite as drainage layer. The heaviest solution is the Substrate S1 in combination with 
expanded clay. However, in saturated conditions, only when Substrate S3 was coupled with recycled polyethylene 
as drainage materials, the weight remained below the limit overload due to hygroscopic structure of the granular 
materials used for the green roof. In fact, the recycled plastic does not absorb water differently from perlite and 
expanded clay. This finding is significant, particularly for the retrofitting of existing buildings, as it highlights the 
importance of considering the structural performance of green roofs not only in dry conditions, as is often the case, 
but also in saturated conditions. 


The workflow created using Dynamo within Revit proved to be effective in automating the determination of 
thermal and structural characteristics of green roofs during the design stage. This automation allowed for seamless 
transition between dry and saturated conditions by adjusting the material properties accordingly. The ability to 
rapidly assess these characteristics enables designers to make informed decisions during the early design stage. 


Designers can employ the algorithm to explore various green roof configurations and materials, considering both 
thermal resistance and structural weight. By inputting different parameters into the Dynamo workflow, such as 
substrate types and drainage materials, designers can optimize green roof designs for specific project requirements. 
For instance, if the primary goal is to maximize thermal resistance while keeping structural weight within a certain 
limit, the algorithm can assist in identifying the most suitable combinations of materials. 


Furthermore, the algorithm's flexibility extends to various climate conditions and building types. Designers can 
use it to assess the performance of green roofs in different regions, taking into account variations in temperature, 
precipitation, and structural load requirements. This adaptability empowers architects and engineers to tailor green 
roof designs to meet energy efficiency goals and structural integrity standards in diverse contexts. 


Overall, the results demonstrate the importance of considering both thermal and structural characteristics of green 
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roofs, not only in dry conditions but also in saturated conditions. The integration of the developed workflow using 
Dynamo and Revit provides a practical and efficient means to assess and compare the performance of green roof 
options, facilitating informed decision-making during the design stage. These findings contribute to the 
understanding and optimization of green roof designs in terms of energy consumption and structural integrity. 


By incorporating this algorithm into the design process, architects and engineers can enhance the sustainability of 
buildings by leveraging green roofs as energy-efficient and structurally viable components. This technology-driven 
approach aligns with contemporary design practices that prioritize eco-friendly solutions while maintaining 
building performance standards. 


4. LIMITATIONS AND FUTURE DEVELOPMENTS 


This section outlines the limitations of the current study and presents directions for future research to address these 
limitations. 


Future research should include real-world case studies to provide a more concrete understanding of the algorithm's 
practical use and effectiveness. These case studies can demonstrate how the Dynamo Visual Programming 
Language (VPL) workflow within Autodesk Revit can be effectively applied to model green roofs in various 
construction projects, thus contextualizing the algorithm within practical design scenarios. 


The analysis in the current study primarily focused on thermal resistance and structural weight as key performance 
metrics. However, a broader range of performance indicators, including water retention, stormwater management, 
biodiversity enhancement, and acoustics, should be explored in future research. This will enable a more 
comprehensive evaluation of green roof performance, considering their contributions to sustainable construction 
from multiple angles. 


While the current research addressed the design phase of green roofs and their integration with BIM, it did not 
extensively explore the construction and maintenance phases. Future research should encompass these phases to 
gain a holistic understanding of green roofs' performance, durability, and maintenance requirements throughout 
the entire building lifecycle. 


To ensure the robustness and adaptability of the methodology, future research should consider alternative 
methodologies and tools. This diversification will accommodate different research questions and potential 
variations in results. Additionally, exploring interoperability with other BIM software platforms will enhance the 
methodology's relevance and applicability. 


Integration of artificial intelligence (AI) and machine learning techniques within the BIM environment should be 
explored in future research. This will enable the optimization of green roof designs, prediction of performance 
outcomes, and data-driven recommendations for materials and configurations, aligning the work with emerging 
trends in construction technology. 


Future research should involve the development of standardized guidelines and protocols for integrating green 
roofs within the BIM framework. These guidelines can streamline data exchange, model interoperability, and 
collaboration among stakeholders, promoting efficient and consistent implementation of green roof projects. 
Additionally, analyzing the economic aspects, including life cycle costs, return on investment, and financial 
incentives, should be a focus. Assessing the economic benefits of green roofs in terms of energy savings, improved 
building performance, and increased property value will provide valuable insights for decision-makers. 


To promote holistic and integrated sustainable building solutions, collaborative research efforts should be initiated 
to explore potential synergies between green roofs and other sustainable building strategies, such as renewable 
energy systems, water conservation measures, and smart technologies. Integrating these strategies within the BIM 
framework will contribute to a more comprehensive approach to sustainable construction, thereby addressing the 
de-contextualization issue raised by the reviewer. 


Incorporating these considerations into future research agendas will provide a more comprehensive and 
contextualized view of the integration of green roofs and BIM, addressing the reviewer's concerns and enhancing 
the practicality and relevance of the work. 
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5. CONCLUSIONS 


In conclusion, this research contributes to a deeper understanding of the integration methods between green 
buildings and Building Information Modeling (BIM), with a specific focus on the incorporation of green roofs. By 
employing the Dynamo Visual Programming Language (VPL) workflow within Autodesk Revit, computational 
automation was achieved, enabling the development of parametric and informative models of green roofs. 


The analysis in dry and saturated conditions provided valuable insights into the thermal and structural 
characteristics of different green roof technologies. The findings highlight the importance of substrate and drainage 
material combinations in influencing thermal resistance in dry conditions, as well as the significance of considering 
structural performance in both dry and saturated conditions. 


However, it is important to acknowledge the limitations of this research, including the specific focus on certain 
green roof technologies, the limited scope of performance metrics, and the emphasis on the design stage. Future 
developments can address these limitations and further advance the integration of green buildings and BIM: 


e Exploring a wider range of green roof technologies and materials. 


e Investigating additional performance metrics related to water retention, stormwater management, 
biodiversity enhancement, and acoustics. 


e Extending the research to include the construction and maintenance phases. 
e Considering alternative methodologies and software tools. 


e Integrating artificial intelligence and machine learning techniques, developing standardized guidelines, 
assessing the economic aspects, and exploring synergies with other sustainable building strategies. 


By addressing these limitations and pursuing future developments, the integration of green buildings and BIM can 
be further optimized, contributing to the advancement of sustainable development in the construction sector. 
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ABSTRACT: Researchers have long focused on disaster resilience to mitigate calamity disruption. Disaster 
resilience is a complex and multi-faceted concept that is challenging to measure. Quantitative methods have 
traditionally been used to assess disaster resilience, but a growing interest in qualitative methods like open-ended 
interviews has emerged to understand experiences and perspectives. To gain deep and consistent knowledge, an 
open-ended interview should focus on an interviewee s point of view and ask follow-up questions from a knowledge 
base that consists of relevant information; otherwise, this can lead an open-ended interview to deviate from the 
interviewee s point of view to the interviewer 5 point of view. In contrast to what is desired, individual interviews 
with last year's students in the field of civil engineering with a predefined and limited knowledge base 
demonstrated inconsistency in asking a follow-up question from an already existing open-ended interview. To 
tackle this gap, firstly, we suggest a knowledge base that can be built from peer-reviewed papers published in the 
disaster resilience field; secondly, we suggest a Natural Language Processing based Decision Support System 
using Sentence Embedding that can analyze the interviewee 5 response and find resources from the knowledge base 
to assist the interviewer in making a consistent follow-up question. 


KEYWORDS: Disaster resilience; Decision support systems; Open-ended interviews; Knowledge management; 
NLP 


1. INTRODUCTION 


Disaster resilience is a critical aspect of construction technology that plays a pivotal role in mitigating the impacts 
of various natural and human-induced hazards on built infrastructure (Malalgoda, Amaratunga, & Haigh, 2014). 
In recent years, there has been an increasing emphasis on enhancing disaster resilience in the construction industry 
due to the rising frequency and intensity of disasters worldwide (Harrison & Williams, 2016). Ensuring the 
resilience of constructed facilities not only safeguards public safety but also minimizes economic losses and 
facilitates rapid recovery in the aftermath of disruptive events (Ouyang, Duefias-Osorio, & Min, 2012). 


The challenges posed by disasters necessitate a comprehensive understanding of the factors influencing resilience 
in the context of construction projects. Traditional research methodologies, such as closed-ended interviews and 
surveys, have been instrumental in gathering valuable data on disaster resilience (Cai et al., 2018). However, these 
methods often fall short in capturing the full depth of participants' experiences and viewpoints, leading to potential 
biases in data collection. 


The specific research objectives of this paper are as follows: 


1. To investigate the impact of consistency in open-ended interviews on disaster resilience measurement 
within the disaster resilience domain. 

2. To develop and implement advanced Natural Language Processing (NLP) based Decision Support System 
(DSS) with sentence embedding techniques to enhance data collection in open-ended interviews. 

3. To create a knowledge base that aggregates and organizes peer-reviewed papers and experts' insights 
related to disaster resilience in construction projects. 


The research questions guiding this study are: 


Research Question 1: How does consistency in open-ended interviews influence the reliability and depth of data 
collected for disaster resilience measurement in construction technology? 
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Research Question 2: Can leveraging NLP and sentence embedding techniques enhance the contextual relevance 
of follow-up questions in open-ended interviews within the construction technology domain? 


Research Question 3: How does the proposed decision support system, empowered by the knowledge base, 
improve data collection and analysis in open-ended interviews on disaster resilience in construction projects? 


This paper addresses the significance of consistency in open-ended interviews concerning disaster resilience 
measurement within the domain of construction technology. We recognize the limitations of conventional 
interview techniques and aim to enhance data collection by leveraging advanced NLP and sentence embedding 
techniques. By utilizing a knowledge base of relevant topics in the field of disaster resilience, our proposed 
approach generates contextually relevant follow-up questions that align more closely with the interviewee's point 
of view. 


The contributions of this work are threefold. In this research, first, we demonstrate the existent level of 
inconsistency in disaster resilience measurement domain. Next, we introduce a knowledge base that aggregates 
and organizes peer-reviewed papers and experts' insights in the mentioned domain. This knowledge base empowers 
our decision support system to identify and generate pertinent follow-up questions for interviewees, facilitating a 
more nuanced understanding of their perspectives. Last, we leverage state-of-the-art NLP and sentence embedding 
techniques to ensure the semantic similarity between the interview responses and the knowledge base, enabling a 
more accurate assessment of disaster resilience. 


Using a decision support system is one method of reducing cognitive errors. To help people with complicated 
decision-making tasks, DSSs offer tools and cognitive aids, minimizing reliance on memory and cognitive 
processes alone (Arnott, 2006). DSS assists people in avoiding biases, mistakes, and oversights that may result 
from impaired cognitive function or flawed heuristics by offloading cognitive burden and offering organized 
advice. Such decision support systems can be implemented and used to improve human performance and decision 
outcomes in a variety of domains. 


In the following sections, we detail our methodology, including the data collection process, the implementation of 
sentence embedding and NLP algorithms, and the evaluation of our decision support system. We also present the 
results of our experiments and discuss their implications for the construction technology field. Ultimately, we 
believe that our approach holds great promise in improving the consistency and depth of data collected from open- 
ended interviews, thereby advancing the measurement, and understanding of disaster resilience in construction 
projects. 


2. LITERATURE REVIEW 


As discussed, interviews serve as the primary data gathering method for disaster resilience measurement. Moreover, 
open-ended interviews offer valuable insights into individuals' perspectives; however, the variation in follow-up 
questions among different interviewers can lead to inconsistency and reduced reliability of data gathered. This 
section examines the focus of existing solutions in various domains, particularly in healthcare, where NLP has 
been applied to assist in decision-making processes. Additionally, the lack of existing any DSS that utilizes NLP 
to aid in interview processes within the domain of disaster resilience will be highlighted. 


The literature review section follows a systematic literature review process as described by (Y. Xiao & Watson, 
2019), with step-by-step details presented in Fig. 1. The literature review commenced by defining a set of keywords, 
namely (NLP OR (natural AND language AND processing)) AND (dss OR (decision AND support AND system*)) 
AND interview to cover the scope of our research. Wildcard characters and special terms were employed to identify 
relevant papers. The keywords were used to search in the abstract, keywords and titles of peer-reviewed papers. 
The search yielded 32, 8, 3, and 11 papers from Scopus, IEEE, ScienceDirect, and PubMed databases, resulting in 
a total of 54 papers. By following Fig. 1, the reasoning and numbers of each step is discussed. 
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Records identified through 
Identification database searching 


n=54 


Records after duplicates removed Records excluded with reason (n = 3) 


ast Using social media instead of interview (n=1 
Screening The article was not about natural language 


Records after screening processing (n=1 
The article does not describe a decision support 
n=38 system (n=1 


Records excluded with reason (n = 11) 


The article was not utelising any natural 
language processing methods (n=8 
Review of other related works (n=1 

NLP was only discussed as a future work (n=1) 
NLP was only listed without further details(n=1) 


Papers after full-text eligibility 
Eligibility checking 


Studies included 


n=27 Additional records found through forward 
Inclusion backward searches 


n=9 


Total number of studies included 


n=3% 


Fig. 1: Literature review's strategy applied by following Xiao’s systematic literature review process (Xiao and 
Watson 2019). 


The list of selected papers, along with their domain classifications, is presented in Table 1. The classification was 
done by finding relevant keywords to a specific field that an NLP based DSS was designed for. The classification 
categories included healthcare, engineering, HR, law, and business domains. Healthcare classification was related 
to any health-related papers and engineering ones were the papers mostly focusing on engineering fields like 
mechanical engineering, constructions, and related topics. Any paper within the concept of law, court, and 
advocacy sat within the classification of law leaving HR related ones for hiring related topics and the only business 
one addressing an NLP based DSS within an enterprise. 


Table 1: List of selected papers from systematic literature review. 


Author and Year Title Domain 
(Bazzan, Echeveste, Formoso, An Information Management Model for Addressing Residents’ Complaints Engineering 
Altenbernd, & Barbian, 2023) through Artificial Intelligence Techniques. 

(Afshar et al., 2023) Deployment of Real-time Natural Language Processing and Deep Learning Healthcare 


Clinical Decision Support in the Electronic Health Record: Pipeline 
Implementation for an Opioid Misuse Screener in Hospitalized Adults. 


(Sultanum, Naeem, Brudno, & ChartWalk: Navigating large collections of text notes in electronic health Healthcare 
Chevalier, 2022) records for clinical chart review. 

(Yadav & Sharma, 2023) A novel automated depression detection technique using text transcript. Healthcare 
(Lau, Zhu, & Chan, 2023) Automatic depression severity assessment with deep learning using Healthcare 


parameter-efficient tuning. 


(Huang, Liu, & Lee, 2023) Talent recommendation based on attentive deep neural network and implicit HR 


relationships of resumes. 


(J. Wang et al., 2022) PhenoPad: Building AI enabled note-taking interfaces for patient encounters. Healthcare 


(Chaichulee et al., 2022) Multi-label classification of symptom terms from free-text bilingual adverse Healthcare 


drug reaction reports using natural language processing. 


(Fujimori et al., 2022) Acceptance, Barriers, and Facilitators to Implementing Artificial Intelligence- Healthcare 


Based Decision Support Systems in Emergency Departments: Quantitative 
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and Qualitative Evaluation. 


(Barale, 2022) Human-Centered Computing in Legal NLP An Application to Refugee Status Law 
Determination. 
(Rachana, Vishwas, & Priyanka, HR based Chatbot using Deep Neural Network. HR 
2022) 
(C. Wang et al., 2022) A Multi-modal Feature Layer Fusion Model for Assessment of Depression Healthcare 
Based on Attention Mechanisms. 
(Flores, Tlachac, Toto, & Transfer learning for depression screening from follow-up clinical interview Healthcare 
Rundensteiner, 2022b) questions. 
(Flores, Tlachac, Toto, & AudiFace: Multimodal Deep Learning for Depression Screening. Healthcare 
Rundensteiner, 2022a) 
(X. Yang, Joukova, Ayanso, & Social influence-based contrast language analysis framework for clinical Healthcare 
Zihayat, 2022) decision support systems. 
(Jan et al., 2021) The role of machine learning in diagnosing bipolar disorder: Scoping review. Healthcare 
(Jenkins et al., 2021) User testing of a diagnostic decision support system with machine-Assisted Healthcare 
chart review to facilitate clinical genomic diagnosis. 
(Barr et al., 2021) An Audio Personal Health Library of Clinic Visit Recordings for Patients and Healthcare 
Their Caregivers (HealthPAL): User-Centered Design Approach. 
(Toto, Tlachac, & Rundensteiner, Audibert: A deep transfer learning multimodal classification framework for Healthcare 
2021) depression screening. 
(Ivanchikj, Serbout, & Pautasso, From text to visual BPMN process models: Design and evaluation. Business 
2020) 
(Uttarwar, Gambani, Thakkar, & Artificial intelligence based system for preliminary rounds of recruitment HR 
Mulla, 2020) process. 
(Bautista, Aló, & Wang, 2020) Deep Learning, Cloud Computing for Credit/Debit Industry Analysis of Law 
Consumer Behavior. 
(Z. Xiao, Zhou, Chen, Yang, & Chi, IfI hear you correctly: Building and evaluating interview chatbots with active HR 
2020) listening skills. 
(Berquand et al., 2019) Artificial Intelligence for the Early Design Phases of Space Missions. Engineering 
(Mai et al., 2018) Modeling Security and Privacy Requirements: a Use Case-Driven Approach. Law 
(Kramer & Drews, 2017) Checking the lists: A systematic review of electronic checklist use in health Healthcare 
care. 
(Saloun, Ondrejka, Maltik, & Personality disorders identification in written texts. Healthcare 
Zelinka, 2016) 
(Højen, Elberga, & Andersena, SNOMED CT adoption in Denmark-why is it so hard? Healthcare 
2014) 
(Ku & Leroy, 2014) A decision support system: Automated crime report analysis and classification Law 
for e-government. 
(Bagheri, Ensan, & Gasevic, 2012) Decision support for the software product line domain engineering lifecycle. Engineering 
(Huang et al., 2011) Lessons learned in improving the adoption of a real-time NLP decision Healthcare 
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support system. 


(Santelices et al., 2010) Development of a hybrid decision support model for optimal ventricular assist Healthcare 


device weaning. 


(Young et al., 2007) Runtime application of Hybrid-Asbru clinical guidelines. Healthcare 


(Sharda, Das, Cohen, & Patel, 2006) Customizing clinical narratives for the electronic medical record interface Healthcare 


using cognitive methods. 


(Warren, 1998) Better, more cost-effective intake interviews. Healthcare 
(Warren, Warren, & Freedman, Interviewing expertise in primary care medicine: A knowledge-based support Healthcare 
1994) system. 


Fig. 2 illustrates the distribution of covered domains, with healthcare prominently represented. While other 
domains are gaining attention, healthcare remains the dominant focus in NLP-based DSS research. Of particular 
significance, the inclusion of disaster-related keywords in our search strategy consistently yielded zero papers, 
underscoring the absence of NLP-based DSS designed for disaster-related open-ended interviews. Hence, this 
paper addresses the imperative need for such a system and provides a solution to bridge this gap in research. 


Business: 2.8% 


Law: 11.1% > 


HR: 11.1% ~ 


Engineering: 8.3% — 


~ Healthcare: 66.7% 


WB Healthcare MM Engineering MHR Law W Business 


Fig. 2: Percentage of papers using NLP for DSS by domain. 


3. METHOD 


This paper introduces a two-stage design aimed at enhancing disaster resilience open-ended interviews. Initially, 
open-ended interviews were conducted with selected participants using a limited knowledge base. Each participant 
was provided with two open-ended questions along with their respective answers. The participants were then asked 
to generate follow-up questions based on the provided knowledge base. This stage aimed to assess the current level 
of discrepancy in existing open-ended interviews. The second stage presents our designed framework, an assistant 
tool, aimed at enhancing the open-ended interview process. This framework incorporates a modifiable and decent- 
sized knowledge base. Additionally, we propose the utilization of an NLP technique to facilitate the decision- 
making process by offering suggestions to the interviewer. 


Ideally, a follow-up question should align with both the knowledge base and the interviewee’s response, unaffected 
by any other factors. In this scenario, the interviewer's role is that of a mediator between the knowledge base and 
the interviewee. Nonetheless, as highlighted by (Gluyas & Morrison, 2014) “human beings are error prone, and 
the flaws are inherent in human cognitive processes, which are exacerbated by situations in which the individual 
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making the error is distracted, stressed or overloaded, or does not have sufficient knowledge to undertake an action 
correctly”. In this paper, this cognitive error is referred to as Interviewer’s Perception. In Fig. 3, which is our 
perception of Jameel’s design for an open-ended interview (Jameel, Shaheen, & Majid, 2018), the interviewer’s 
perception can be seen as the extra factor highlighted in red which should either be eliminated or reduced to certain 
degree for data collection to be more reliable. 


Select a topic to ask 
questions from 


Start of interview 


Answer the question 


Interviewee 


Interviewer 


Intermewer's percepton 
Mi 


Should ask ' Ask a follow-up question 
a follow-up from the knowledge base 
question? relevant to the answer 


Any other 
topic 
questions? 


End of interview 


Knowledge 
base 


Fig. 3: Process of an open-ended interview based on our perception from Jameel’s work (Jameel, Shaheen, & 
Majid, 2018) 


An interview was strategically designed to assess the impact of interviewer’s perception on the process of asking 
follow-up questions during open-ended interviews in the domain of disaster resilience measurement. The interview 
protocol comprised the following steps: 


e Participants: Thirteen students with civil engineering academic backgrounds were recruited for the study. 
Population sample size was determined based on the number of papers published in 2022 with the 
keyword “open AND ended AND disaster* AND resilient*” in Scopus, which yielded thirteen papers. 
The Cochran’s formula for small population sizes was applied with a confidence level of 95% resulting 
in a sample size of 13 (Nanjundeswaraswamy & Divakar, 2021). 

e Interview protocol: Two sets of open-ended questions with answers were developed to elicit rich data of 
decision-making in open-ended interviews of disaster resilience measurement. The interview protocol 
included prompting the participants to elaborate on their responses and give reasoning for their decision- 
making thoughts. Each student is supposed to select two topics for each set of open-ended questions. 

e Knowledge base: A specific knowledge base for the research domain was created by selecting the top 
twelve topics of peer-reviewed papers from Scopus, aligning with the keywords used in the open-ended 
questions in step 2. The topics were directly extracted from the papers. The purpose of the knowledge 
base was to enhance the interview process by narrowing down the choices for follow-up questions and 
reducing the need for students to possess prior knowledge for asking such questions. 


3.1 Analysis 


In our structured interview design, we recognize the practical constraints of interviewers reviewing numerous 
options during the interview process. As a result, we restricted the number of topics to a manageable dozen for 
each question. Each topic is supposed to cover a chain of thoughts from the interviewee’s point of view. However, 
acknowledging that a dozen topics present limitations, it becomes evident that such a limited number may not 
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encompass every potential point of view expressed by interviewees during an open-ended interview. In practice, a 
more substantial number of topics would be necessary to encompass a broader range of existing knowledge and 
adequately cover an interviewee's perspectives. Conversely, with a larger knowledge base, the probability of each 
choice being seen and selected diminishes. 


To calculate the probability, we can use the complement rule. To calculate the probability of selecting at least one 
topic in two chances equals to the complement of not selecting a topic in two chances. If we consider the number 
of topics as n, the probability of not selecting a topic in one chance is (n-1)/n, and the probability of not selecting 
a topic in two chances is (n — 1)/n x (n — 2)/(n — 1). Using the complement rule, the probability of selecting 
at least one topic in two chances can be calculated as (Lefebvre, 2009): 


Probability of a topic selection in two chances (PTS) = (1 — (n — 1)/n x (n — 2)/(n— 1)) x 100 


Simplifying further: 


n n-2 200 
prs =(=-"—) x 100 = — (1) 
n n n 


Considering our interview scenario with only a dozen topics, the rounded value of Probability of Topic Selection 
(PTS) is 16.67. Obviously, the greater the n, the lower the probability of a choice to be selected and with only two 
topics, each of them will have the probability of 100% to be selected. Let us now examine the selection process 
from the knowledge base, which is executed by the interviewer to choose a follow-up topic. 


In the perfect scenario, we would assume that every student only selected a pair of topics and no other topics for 
the follow-up questions. However, in case of reality, which probably differs from the perfect scenario, we will 
consider the most selected choice added to the second most selected as the probable answer and the number of 
times that they were selected as PA and number of times that other choices were made as SC. Thus, we can simply 
calculate the ratio of discrepancy by using the following formula: 


Discrepancy Ratio (DR) = SC/(PA + SC) x 100 (2) 


A lower DR indicates a closer approximation to a perfect interview with minimal errors, approaching a DR score 
of zero. Since we had two sets of questions, we measured them both separately and reported the result with the 
average of them DRs as we put an equal weight on each of the questions. The maximum value of DR can only be 
achieved if each topic for follow-up question topic is selected exactly once or twice. In this case, PA will be equal 
to 6 (3 for the most selected plus 3 for the second most selected choice) and SC will be equal to total number of 
votes which is 26 (13 students and each of which could select 2 topics) minus the rest of the votes which is 20. By 
applying the formula, the result will be 77%. 77% error is a significant value that can impact data collection; thus, 
measuring DR in a real-case scenario is important and furthermore it implies the significance of this study. It 
should be noted that the value of DR can fluctuate between 0% to 77% with the median of 39%. 


In order to comprehensively assess an interview's thoroughness, consistency, and the presence of discrepancies in 
the selection of follow-up question topics, we have devised a novel metric. This simple metric involves the 
multiplication of PTS and DR, with lower values indicating a more valid and reliable interview. We term this 
metric 'Interview's Inconsistency Mark' and it is calculated as follows: 


Interview's Inconsisteny (IIC) = (PTS x DR)/100 (3) 


The reason that we multiply the values is the importance of DR being zero. It means that if the two obvious topics 
will be selected, it doesn’t matter what is the probability of each topic. Considering our designed interview, the 
Interview's Inconsistency (IIC) can vary between zero and 12.84 (approximately 13). An IIC value of 13 indicates 
an interview with highly unreliable data gathering due to inconsistencies in the interviewer's follow-up questioning. 


4. FINDINGS 


In this section, the obtained results from the interview described in the previous section will be reviewed. The 
outcome of this interview provides insights into the practical aspects of conducting open-ended disaster resilience 
measurements by various interviewers. The interview, which simulates a real-case scenario of open-ended disaster 
resilience measurement, can demonstrate how significant inconsistencies can be in real-world, further implying 
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the need of our designed decision support system. Furthermore, an exemplified DR value will be presented to 
facilitate a comprehensive understanding and performance comparison of our devised framework. Additionally, 
our framework will be introduced, aimed at assisting interviewers in formulating more consistent follow-up 
questions during open-ended disaster resilience interviews. 


Prior to delving into the interview results and our framework, an essential concept gleaned from the literature 
review emphasizes the significance of a robust knowledge base during interview processes. In qualitative and 
open-ended interviews, interviewers are more adept at formulating relevant questions when equipped with 
pertinent information and prior knowledge of the subject (Kallio, Pietilä, Johnson, & Kangasniemi, 2016). As the 
interviewee sees the interviewer as educated and well-prepared, it aids in building trust and rapport. A strong 
knowledge base also enables the interviewer to go into complex subjects in more depth, pose probing questions, 
and elicit perceptive responses. This in turn aids in the collection of reliable data during interviews. Therefore, it 
is essential for interviewers to have access to a broad knowledge base which for instance, it is made from literature 
reviews, professional consultations, and in-depth studies to conduct effective and relevant interviews. 


In addition to a solid knowledge base, interviewers must exercise caution when posing follow-up questions to 
avoid making arbitrary assumptions. Follow-up questions serve the purpose of elucidating or further examining 
specific aspects of the interviewee's response. However, by phrasing their follow-up questions based on their own 
beliefs or preconceived notions, interviewers unwittingly introduce bias or influence the interviewee's answers 
(Hunt, 2009). The objectivity and dependability of the interview data may suffer as a result. Interviewers should 
approach follow-up questions with an open and impartial mindset, allowing the interviewee's perspective to guide 
the dialogue and mitigate potential bias. Interviewers can foster a more accurate and thorough grasp of the 
interviewee's experiences and opinions by actively listening, refraining from asking leading questions, and keeping 
conscious of personal biases. Referring to Wreathall, the skill of avoiding cognitive human errors in such decision 
making can be achieved by investing a lot of time and effort and they need constant investment (Wreathall & 
Reason, 1992). 


4.1 Findings from the conducted interview 


In our designed interview, the first question, the first question yielded 11 and 7 selections for the most and second- 
most preferred choices, respectively. This implies a Disaster Resilience (DR) value of 30.77, and with the pre- 
calculated PTS value of 16.67, the Interview’s Inconsistency (IIC) equals 5.13, indicating a moderate level of 
discrepancy within the range of 0 to 13. On the other hand, in our second designed interview, we obtained 12 and 
9 selections for the most and second most selected choices. Upon applying the formulas, we derived an IIC of 1.35 
representing a favorable level of data reliability and reduced inconsistency within the range of 0 to 13. These 
results indicate values falling within the lower half of the normal distribution (between 0 and 13) concerning real- 
world open-ended disaster resilience measurement interviews. However, this does not negate the possibility of 
encountering discrepancies near values such as 5.13, which align with the median value of 6.5. While these 
numbers serve as indicators, they emphasize the need for caution regarding inconsistency, which can potentially 
undermine the validity of the collected data. 


4.2 Our proposed framework 


To address this concern, our designed framework incorporates two essential steps. First, a knowledge base which 
has enough knowledge related to disaster resilience for the moment that an interviewee gives an answer to a 
question and the interviewer needs to ask a follow-up question from it. Second, a decision-making technique for a 
follow-up question selection from the designed knowledge base. 


Historically, an interviewer’s prior knowledge has been primarily regarded as the knowledge base. This has been 
one of the roots causes where we detected the inconsistency of open-ended interviews in disaster resilience 
measurement exists. Therefore, our primary objective in designing the framework was to establish a 
comprehensive knowledge base. We have identified two primary avenues for obtaining information: the existing 
literature on disaster resilience measurement and the expertise of disaster resilience measurement specialists. 
Given the considerable effort required to access and elicit knowledge from diverse experts, we deemed the first 
option more feasible. We opted for Scopus as our literature database, utilizing automation to extract all relevant 
peer-reviewed papers based on specific keywords pertaining to open-ended questions in the field of disaster 
resilience measurement. This automation allows us to expand beyond a limited number of topics to access 
thousands of thoroughly researched papers, ensuring a reliable and extensive knowledge base. During our 
preliminary tests of the automation system, we successfully retrieved a maximum of 1500 peer-reviewed papers 
for each set of question keywords that were given to the students in our conducted interview. To run queries against 
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the knowledge base, we indexed the knowledge base with Anserini’s Information Retrieval library for Python, 
called Pyserini, that has a low latency for information retrieval (P. Yang, Fang, & Lin, 2017). As a result, the PTS 
value from formula 3.1.1 equates to 0.13, implying an extremely low probability of a topic being selected from a 
knowledge base of this magnitude, assuming equal weightage for each topic. Conversely, this approach instils a 
higher confidence level in our knowledge base, encompassing a diverse range of topics that interviewers can 
choose from, aligning more closely with the interviewee's perspective. 


In the context of decision-making techniques and drawing from relevant papers in the literature review section, 
we propose a methodological approach that utilizes Sentence Embedding techniques to generate follow-up 
questions aligned closely with the interviewee's perspective. Sentence Embedding is an NLP problem that deals 
with identifying text that have similarities based on context, meaning and subject etc. based on which classification, 
generation, syntactic parsing etc. of the text can be done (Ryu, Kim, Choi, Yu, & Lee, 2017).Given this definition, 
it becomes evident why this technique captured our interest. Combined as one of the most recent advancements in 
the field of NLP, it considers a sentence as a whole and find similarities, which in our case, sentence embedding 
plays the role of an interpreter in finding similar topics from a knowledge base of relevant topics in the domain of 
disaster resilience. To query the indexed knowledge base, we used T5 doc2query since it has the primary advantage 
of low retrieval latency, keeping an open-ended interview’s follow up question generation to be in real-time 
(Nogueira, Yang, Lin, & Cho, 2019). We utilized cosine similarity to measure semantic similarity between the 
embeddings. The top one percent of highest-ranking topics were made available for the interviewer’s selection. 
With these considerations, assuming a population of fifteen interviewers using our decision support system, and 
the algorithm identifying 1500 topics for a follow-up question, with a selection of two topics from the top one 
percent (15 topics), the resulting DR value could range from zero to 86.67. Although we analyzed 1500 topics, we 
did not observe significant progress in terms of DR. Nonetheless, it is essential to acknowledge that the user can 
modify the one percent value, and 1500 topics represented one of the highest feasible retrieved numbers from 
Scopus. Nevertheless, the IIC value from formula 3.1.3 in this context will vary between zero and 0.11, which 
stands in stark contrast to the actual interview with an IIC of 5.13, and even the lowest recorded case with an IIC 
value of 1.35, as well as the worst-case scenario with an IIC value of 13. The minimal fluctuation achieved 
represents a significant level of consistency for future open-ended interviews in the domain of disaster resilience 
measurement. 


4.3 Conclusion, future works and limitations 


In this article, we demonstrated the significance of consistency of open-ended interviews in the domain of disaster 
resilience measurement. Furthermore, with a methodological approach, we aimed to address the potential 
limitations of open-ended interviews by utilizing sentence embedding techniques and introducing a knowledge 
base to generate contextually relevant follow-up questions. By leveraging sentence embedding techniques and a 
knowledge base generated from peer-reviewed papers, our approach enables interviewers to gather more 
comprehensive and contextually relevant data during open-ended interviews. This enhanced data collection 
process leads to a deeper understanding of participants' experiences and viewpoints, facilitating better-informed 
decision-making for disaster resilience measures in construction projects. 


Moving forward, potential avenues for future research and development include expanding the knowledge base 
by involving experts to review and contribute their insights. Additionally, continuous advancements in Natural 
Language Processing (NLP) algorithms offer opportunities to improve the performance and efficiency of the 
sentence embedding technique used in our system. Further research can explore the integration of additional data 
sources and domains to enhance the decision support system's versatility and applicability in diverse construction 
technology contexts. 
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ABSTRACT: Semantok Dam located in Semantok River Stream, Nganjuk District, East Java. Dominated by 
lowlands and mountains, the 1900-hectare fertile agricultural land will be irrigated by this nominated “The 
Longest Dam in Southeast Asia”. The construction of this three kilometers long dam requires enormous resources 
of rockfills as the dominant material to build the main dam body. While the process of excavation, mobilization, 
and material settings are the dominant contributor aspects of the projects carbon footprint, at the same time this 
project encounter a challenge on insufficiency of existing quarry. This situation drives a comprehensive strategy 
not only to find the most efficient and accessible material, but also to minimize and mitigate environmental 
damage, ultimately by reducing the material carbon footprint. Thus, an innovative engineering solution is applied 
to overcome this challenge such as utilizing the available material in surrounding project site which is random 
rock soil by using geotechnical analysis tools for design optimization and material usage simulation also 
collaborating with Building Information Modeling (BIM) to visualize and calculate the estimated cost. Eventually, 
this analysis plays a big role in ensuring the environmental sustainability in an infrastructure project by deciding 
the appropriate alternative which produce the least carbon emission. 


KEYWORDS: BIM, Carbon Footprint, Engineering Analysis, Resource Management, Sustainability 


1. INTRODUCTION 


Embodied carbon represents the million tons of carbon emissions released during the lifecycle of infrastructure 
building materials; including extraction, manufacturing, transport, construction, and disposal. Concrete, steel, and 
insulation are all examples of materials that contribute to embodied carbon emissions [1]. Furthermore, other 
activities like excavating and earthmoving materials like rocks, soil, sand, and other similar substances can also 
significantly contribute to embodied carbon emissions especially when used in large volumes. The buildings and 
construction sector accounted for 36% of final energy use and 39% of energy and process-related carbon dioxide 
(CO2) emissions in 2018 [2]. Global buildings and construction sector emissions increased 2% from 2017 to 2018, 
to reach a record high, while final energy demand rose 1% from 2017 and 7% from 2010. Increases were driven 
by strong floor area and population expansions [2]. 
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Fig. 1: Life-Cycle Assessment Phases of Embodied Carbon Emission in General Building 
Source: RMI 


While infrastructure buildings sector efficiency improvements continued to be made, they were not adequate to 
outpace demand growth. 2020 is a key year for countries to enhance their Nationally Determined Contributions 
(NDCs), especially concerning further actions to address energy use and emissions including embodied emissions 
in the buildings and construction sector [2]. Countries are innovating and implementing measures to improve 
efficiency and reduce emissions from their building stock. As sharing effective measures globally would amplify 
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their impact, regional roadmaps are being developed for this purpose [2]. 


Indonesia seriously and consistently continues to conduct its commitment to address climate change through Low 
Carbon Development Planning (PPRK) [3]. PPRK is a strategic transformation of the National Action Plan for 
Reducing Greenhouse Gas Emissions (RAN-GRK) program as stipulated in Presidential Regulation No.61 Year 
2011 [3]. As a form of consistency in efforts to address climate change, the issue is one of the national priorities 
that becomes a cross-cutting program in the 2015-2019 National Medium-Term Development Plan (RPJMN) 
document. President Joko Widodo has delivered a commitment at the UN Climate Change Conference (COP21) 
in Paris, France, on 12 December 2015, which is to reduce emissions by 29% (Fair scenario / using own 
capabilities) and by 41% (ambitious scenario / if you get international support) [3]. The commitment was ratified 
through Law No.16/2016 on the Ratification of the Paris Agreement to the United Nations Framework Convention 
on Climate Change [3]. Aligning with these commitments, Indonesia's efforts directly support the objectives of 
the United Nations Sustainable Development Goals (SDGs), specifically SDG 13, which is "Climate Action." By 
focusing on Low Carbon Development Planning and reducing greenhouse gas emissions. By synchronizing its 
national strategies with global sustainability targets, Indonesia not only demonstrates its dedication to combatting 
climate change domestically but also champions a collective global responsibility. 


Carbon emissions mitigation in the construction sector is not easy. The adoption of digital construction 
technologies emerges as a potent strategy to mitigate carbon emissions in the construction sector. Leveraging 
digitalization allows architects, engineers, and stakeholders to collaboratively refine building designs, 
emphasizing energy efficiency. Energy efficiency could be achieved through several things, such as insulation, 
natural lighting, and heating and cooling systems, effectively diminishing a building's carbon output over its entire 
lifecycle. Digital engineering improvements such as reducing waste and promoting the selection of materials 
through good planning can reduce carbon footprints. Hence, embracing digitalization not only signifies 
technological progression but also propels industry towards the broader objective of environmentally sustainable 
construction and reduced carbon emissions. 


2. BACKGROUND OF STUDY 


The construction industry is faced with challenges such as project delays, over budget costs, quality issues, and 
environmental concerns. Digital construction technologies such as engineering analysis, digital survey tools, 
Building Information Modelling (BIM), and Geographic Information System (GIS) are proposed as a solution to 
these problems. The integration of those technologies plays a big role in optimizing productivity in project 
construction and to ensure engineering validation accuracy. The data which was given by the planning consultant 
will be validated so it can be executed on the field. Initial mapping is processed using digital survey tools. The 
mapping results are used as a basis for the BIM reality model to aid the design team gain a better understanding 
of the project's characteristics. Furthermore, the use of BIM enables real-time monitoring and simulating through 
the entire building life cycle process. GIS is utilized to enhance decision-making for continued management 
monitoring, reducing all risks that could arise throughout the project execution phases. 


According to the Indonesian Public Works and Housing Ministry's Regulation No. 9 of 2021 on sustainable 
development guidelines, the implementation of BIM is mandatory. BIM facilitates the visualization of plans and 
their execution, ensuring a consistent interpretation amongst all stakeholders, thereby minimizing the potential for 
errors or misunderstandings. Although the design process is usually established at the beginning there are often 
real project conditions that do not match the initial design, which triggers design changes. BIM plays a crucial 
role in speeding up the analysis of these design alterations. With the acceleration of this decision-making process, 
BIM also contributes to cost and time efficiency. 


Besides all the benefits that can be provided by BIM tools, to boost the impact on calculating carbon emissions 
on a construction project, the data produced by BIM should be integrated with comprehensive analytical 
calculations to ensure the carbon footprint is determined in a scientifically accurate way. 


Completely eliminating carbon emissions in a project may not be an option, but the project management can 
control and minimize these emissions through responsible selection of materials, designs and working methods. 
Before making these choices, it is important to assess the impact of each option. Thus, the most effective options 
can be selected and implemented in the field. 


This study aims to compare the embodied carbon contained in the initial design of an infrastructure project and 
the alternate design after validating the latest situation on the project site. The necessity for this comparison arises 
from differences in the field conditions, particularly the availability of materials, which did not align with the 
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conditions assumed in the initial design. 


3. PROJECT OVERVIEW 


The Semantok Dam Project is one of the National Strategic Projects in Indonesia, located in the Semantok River 
Stream, Nganjuk District, 115 km West of the Surabaya City, East Java. The Semantok Dam's primary objectives 
are to lessen flood discharge and assure water availability on its coverage area during both the rainy and dry 
seasons due to the intense annual rainfall. The terrain of the Nganjuk district is dominated by the lowlands and 
the mountains, making the soil condition fertile for cultivating plants. Semantok Dam will irrigate the 1900- 
hectare coverage agricultural area in Nganjuk District where its existence expected to boost agricultural 
productivity from 186.33% to 300%. Moreover, the presence of this dam will be the new tourism destination in 
East Java Province. 


The construction cost of this project reaches 87.9 million USD covering dewatering process, main dam, spillway, 
intake channel, facility building, geotechnical and hydromechanical works. With the total main dam’s length of 
3.1 km and height of 31.56 m, Semantok Dam is claimed as “The Longest Dam in Southeast Asia”. The total 
capacity of the main dam is approximately 33 million m°. The length of Semantok’s spillway is 62.69 m with 
overflow discharge of 574.54 m?/second. While the length of the dam’s intake is 16.38 m with the tower dimension 
of 1.75 x 1.75 m using reinforced concrete. 


According to the initial design, rockfills were used as the primary material for the dam. However, the rockfills 
quantity was insufficient in the existing quarry. Therefore, two alternatives were solving the problems. Firstly, 
choosing a new quarry where rockfills are available, but it would drive a significant cost addition and wellness 
issue for the surrounding society. Secondly, using the available materials in the existing quarry, which was random 
soil. Considering the environmental matters, Hutama Karya, as the lead contractor, validated the cost, time, and 
environmental implications and preferred to use random soil as the primary materials. However, a slope redesign 
was required, as the strength of random soil was below that of the rockfills. 


4. JUSTIFICATION DESIGN 


Before the construction of the dam began, Hutama Karya as the lead contractor initiated an advanced design study 
to ensure the feasibility of the initial design, which would then be adapted based on current field conditions. This 
study included a preliminary geological investigation of the construction site and a review of the dam body's 
zoning design, adjusted for the availability of fill materials. The consultant engineers initially developed a grouting 
system as the foundation for the main part of dam, however, Hutama Karya discovered the brittle and loose sandy 
soil layer would cause persistent water leaks over the maximum amount permitted. Hutama Karyaneeded to 
undertake soil analysis to determine alternative design methods and ensure the dam would be strong enough to 
contain water from intense rains without flooding. The results from the initial geological investigation of the 
construction location differed from the initial planning design, necessitating further studies into the dam 
foundation repair plan. 


The Bendoasri and Teritik Quarry are still on the planning stage. In the initial design provided by the consultant 
engineers, the zoning of the dam body is an upright core type with rockfill. However, based on the results of the 
initial geological investigation on construction phase, the two quarries did not have sufficient stone material 
available. The quantity/volume of stone material availability was quite limited, on contrary random soil material 
was abundant. The plan became difficult to accomplish, as the nearby quarry could not produce enough rock for 
the long dam without deep damaging excavation. Another option is digging a new quarry for the site, but that 
would be costly. 


The insufficiency of rockfills in the existing quarry is one of the biggest matters in the Semantok Dam Project. 
Due to environmental and material availability considerations, the project team replaced the rockfills with random 
soils — available material in the fields. However, the random soil strength was below the rockfills. The 
insufficiency of rock created risk to potentially redesigning the dam slope. Engineering analysis software was 
utilized to model the material replacement and verify whether the initial slope design was still applicable in the 
fields. 


In the beginning, Hutama Karya tried to model the random soil in the initial design and found that the safety factor 
of the slope (1.183) was below the minimum requirements (1.302) and showing that the slope was inapplicable. 
Thus, with engineering analysis software, a process of trial and error was undertaken to find the safest and most 
optimized design. As a result, the assessment indicated that the slope should have a steeper incline of 1:3 on the 
left side and 1:2.75 on the right side to produce the safety factor of 1.644, which fulfilled the minimum 
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requirements. Not only did solve the problems, but also able to gain more value in terms of materials and method 
efficiently. Hutama Karya was able to avoid 1.8 million USD of reworking by renewing the slope design. 
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Fig. 2: Calculation Flow 


5. LIMITATION OF STUDY 


This study will focus on emphasizing the calculation of the carbon footprint arising from design changes because 
of engineering justification using Building Information Modelling (BIM) & engineering analysis software. In this 
context, the case that becomes the focus point is the change in the selection of the main material used in the 
construction of the dam body. 


This process of calculating the carbon footprint does not cover all aspects of construction, but rather focuses on 
some essential elements that are directly affected by changes in design. These aspects include the creation of a 
new quarry which is the main source of construction materials, the distance of material delivery from the quarry 
to the construction site, the number and type of equipment used in material delivery and the construction process 
itself. 


Therefore, this study aims to provide a clear and comprehensive picture of how design changes using BIM and 
engineering analysis software can affect the carbon footprint in construction projects and provide a foundation for 
more informed and sustainable decision-making. 


6. CALCULATION METHODS 
This Figure shows the flow of study methods. 


In this study, as illustrated in Figure 2, the calculation method involves comparing two distinct scenarios. In the 
first scenario, we consider the initial design or Design Certification, which upon implementation, has been found 
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lacking in terms of material sufficiency. The available quarry is incapable of providing the necessary quantity of 
materials, leading to the need for another quarry that is considerably distant from the project site. 


The second scenario maintains the usage of the first quarry that planned available material, in proximity to the 
project site but demands alterations in the design, necessitating recalculations and technical justifications. The 
changes are to ensure that the local materials meet the specifications required for dam construction. But the first 
quarry was originally a pine forest area, so the utilization of the quarry will require land conversion, also calculate 
how much carbon stock is lost and how many trees were cut down to facilitate reforestation after construction 
finished. 


Data is collected for both scenarios, including the quantity of materials needed per the initial and revised designs, 
the dam project completion schedule, and the distance between the quarry and the project site. From these data, 
we can establish productivity targets for the work, thus enabling the detection of resource requirements. 


After determining the necessary equipment, the next step is to calculate fuel consumption during the construction 
process for each heavy equipment used. Once the total fuel used is determined, this amount is then converted to a 
form of energy. Energy conversion from the use of diesel fuel to other forms of energy is an average of 38.243 
MJ/Liter or 38.243 x 10^-6 TJ/Liter. After being converted to a form of energy (TJ), the next step is to calculate 
the resulting carbon emissions. To calculate this, we use a formula derived from the IPCC Guidelines for National 
Greenhouse Gas Inventories (2006). 


Emission = Y (Fuel x EF;) E TE E (1) 
J 
Where: 
Emission = Total of Emission (Kg) 
Fuel = Fuel Consumed (TJ) 
EF = Emission Factor (Kg/TJ) 
j = Fuel Type 


For the type of fuel, all heavy equipment used uses diesel oil. The emission factor for the type of diesel oil fuel is 
74,100 Kg/TJ, as shown in Table 1. 


Table 1: Road Transport Default CO2 Emissions Factors and Uncertainty Ranges 
Fuel Type Default (Kg/TJ) | | 


i | Lower | Upper 
! Motor Gasoline 69300 67500 73000 
| Gas / Diesel Oil ' 74100 ' 72600 ' 74800 i 
Liquefied Petroleum Gases ! 63100 ! 61600 ! 65600 ' 
| Kerosene ' 71900} 70800} 73700 
| Lubricants 73300 71900 75200 
; Compressed Natural Gas 56100 54300 58300 | 
| Liquefied Natural Gases 56100 54300 58300 i 


Source: IPCC Guidelines for National Greenhouse Gas Inventories (2006) 


Subsequently, these results are utilized to calculate the carbon emission resulting from each scenario. Finally, the 
data from both scenarios is compared to draw insightful conclusions then plan to replace the carbon lost due to 
land conversion. 
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7. DATA COLLECTION 


As initial information, to find out the body parts of the dam, below is a typical picture of the cross section of the 
dam building structure. 
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Fig. 3: Typical Cross Section of Dam 


The insufficient material is in the Zone 4, which is the most dominant part of the entire dam structure, comprising 
67% of the total volume of the dam. According to the master schedule and contract, the work on the zone four 
must be completed within 24 months or 730 Days to avoid disrupting subsequent tasks. The quarry has two 
alternative locations. The first quarry is in the Bendo Asri & Tritik area, but it does not contain rock soil materials 
that passed the required specifications. The second quarry is in the Blitar & Kediri area and the rock soil materials 
there passed the required specifications. Figure 4 shows the distance between the project’s site to each quarry. The 
distance to the first quarry is approximately 10.5 Km and the distance to the second quarry is approximately 84.6 
Km. 


Project Location 
First Quarry 


Project Location 


Fig. 4: Distance Between First Quarry (Bendoasri & Tritik) and Second Quarry (Kediri & Blitar) with The 
Project 


The result of the assessment on the potential quarry sites shows that the first quarry is in an area originally covered 
by a pine forest. Utilizing this site would convert the landscape of the surrounding area from forest to quarry. The 
second quarry, for comparison, is a pre-existing site so there is no additional land conversion would be required 
for it to be operational, but it has downside which is the distance between the quarry and the project’s site. 


7.1. Initial design / design certification 


This project’s initial design stage is also known as the design certification. At this stage, the initial design specifies 
Rockfills material for the zone four of dam’s body. The material shall be obtained from quarry sources in the 
Bendoasri and Tritik localities, approximately ten kilometers from the project site, as shown in Figure 4. Based 
on Table 2, the initial design data indicates that rockfill material can be sourced from the Bendo Asri & Tritik 
quarry, ensuring sufficient supply. 


Table 2: Initial Condition Based on Design Certification 


' Before Soil Investigation 
Material 


Distance (KM) l Availability 


= 7 = T : T 
Volume required (m°) Volume Quarry (m°) Location ! 


' Rockfills ' 1,998,934 ' 2,390,000 1 Bendo Asri &Tritik 


10 1 1.20 Fulfilled 


Upon further soil investigation in the pre-construction phase, it was found that the quarry did not contain rockfill 
material that met the initial design specifications, rendering it unsuitable as a rockfill quarry. To address this issue, 
an alternative location was sought that contained rockfill material meeting the specifications. A suitable quarry 
was found in the second quarry at the Kediri and Blitar areas, approximately 85 KM away, as illustrated in Figure 
4. Therefore, Case A involves replacing the original quarry with the second quarry, which contains suitable rockfill 
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material. This switch leads to changes in the data, as depicted in Table 3. 


Table 3: Initial Condition Based on Design Certification After Change to Second Quarry 


Alternative Quarry After Soil Investigation 


Material 


Volume required (m°) Volume Quarry (m°) Location Distance (KM) Availability 
1,998,934 3,107,000 Kediri & Blitar Fulfilled =| 


Rockfills 


7.2. Design change / design review 


Following the soil investigation, it was discovered that most of the material at the original quarry located in Bendo 
Asri & Tritik, is random soil. To prevent relocating the quarry, a design review was conducted, which involved 
changing the dam body material from rockfills to random soil. Hutama karya determined that the design issues 
can be solve by using reality modeling and geotechnical design. 


First, a laser scanner undertook at the project area, created point clouds, and molded them into a digital replica of 
the site by using Context Capture Software. The digital replica helped the project team understand the existing 
condition of the field and plan local quarry locations, minimizing the excavation depth to limit the impact on the 
environment. The team then imported reality modeling data into their bespoke project management information 
system, giving the project manager insight into real-time conditions. Next, the organization then augmented the 
reality model with geotechnical analysis via Plaxis Software, enabling them to simulate foundation options and 
test the groundwater flow. Plaxis enabled them to model soil fill within the proposed dam design and test its 
performance within the area’s terrain. Though the initial slope design did not meet safety requirements, they used 
OpenRoads Designer Software to evaluate other slope designs, eventually realizing that a greater slope on the left 
side of 1:3 combined with a lesser slope of 1:2.75 on the right side would meet safety requirements for both 
construction and operations while incorporating the sandy soil as fill. This adjustment will increase the safety 
factor and strength, leading to an expansion in the volume of the dam body in Zone 4. Accordingly, fortification 
calculations were conducted using the Plaxis and OpenRoads Designer Software, as shown in Figure 5. 


Fig. 5: Redesign Calculation of the Dam Body Using Plaxis and OpenRoads Designer Software 


Lastly, to ensure the changes to the design would not impact the tight deadline, Hutama Karya simulated the 
construction with Synchro Software. In addition to testing the construction feasibility of the new design, the 
application helped them plan the construction process. 


= - à 
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Fig. 6: Quarry BendoAsri & Tritik Classification & Quantification Using OpenRoad Designer 


As shown in Figure 6, to ensure the availability of the quarry in that area, the quarry volume was recalculated 
using BIM to speed up the quantification process. Material take off generated from a 3d model created in 
OpenRoad Designer Software to help the engineers accurately visualize upon calculation, it was estimated that 
there is 5.3 million cubic meters of random soil available. Therefore, the first quarry is still utilized, but changes 
are made to the dam body design so that random soil material can still be used, as indicated in Table 4. 
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Table 4: Condition After Design Review Using the Random Soil Material at First Quarry 


Random Soil in First Quarry after Design Review 
Material 


jj 
Ratio | Availability 


231 | Fulfilled: 


j i j 
Volume required (m°) | Volume Quarry (m°) i Location | Distance (KM) 


2,308,176 5,326,000 Bendo Asri &Tritik 10 


T 
| 
| 
1 
1 
' 


Random Soil 


The first quarry is currently a pine forest owned by Perhutani. When this pine forest is converted into a quarry 
location, the carbon stock will disappear. Therefore, to comply with existing regulations and to support sustainable 
project achievement towards the SDGs, Hutama Karya, as the contractor, is obliged to replace the carbon at the 
quarry site by reforesting after the project is completed and the quarry is no longer in use. 


Table 5: calculation of the amount of carbon lost from land conversion. 

| | Carbon of each | Total Amount of 
tree | Carbon 

(KgCo2/Pine) | (KgCO2) 
22.60 3,198,827 


Total Amount of 
Carbon 
(TonCO2) 


3,199 \ 


if p 
| Amount of 
| Tree 
i 
i 
i 
i 


Area Quarry 


Area Quarry 


f 
i ji 
2 | | Type of Tree 
m) | w | vip 


i 200,000 20 Mercus Pine 141,541 


According to the data in Table 5, it turns out that by converting a 20-hectare pine forest, a carbon reserve of 3,200 
tons of CO2 will be lost. This loss will be offset by reforestation after the project is completed. 


8. DATA PROCESSING 


The data calculation begins with the collection of initial data, followed by determining the productivity targets for 
each case based on the distance and travel time to the quarry, as well as the execution time (work schedule and 
working hours). The results of these calculations can be seen in Table 6. 


Table 6: Comparison Resume of Initial Data & Target Productivity each Case 


No | LIST CASE A CASE B 
io] Materials Rockfills Random Soil 
! 2 | Volume required (m°) ' 1,998,934 m? 2,308,176 m | 
© 3 + Volume Quarry (m°) 3,107,000 mo 5,326,000 m i 
4 Location Quarry Kediri & Blitar Bendo Asri &Tritik 
5 ! Ratio Stock Quarry i 1,55 i 2.31 
' 6 | Availability Fulfilled Fulfilled 
7 ! Quarry Type ! Existing Quarry ! New Quarry ! 
t 8 | Distance From Project (KM) i 85 KM | 11 KM | 
9 Duration Quarry - Project (Minute) | 160 Minute | 25 Minute | 
10 Work Hour in a Day ! 8 Hour ! 8 Hour | 
' 11 ' Schedule Duration of Work i 730 Days ' 730 Days ! 
| 12 | Target Productivity/day 2,738 m/day | 3,162 m*/day ' 


After determining the productivity targets, the next step is to ascertain the equipment needs for each case. To 
simplify this process, equipment needs are determined for three different locations: the equipment located at the 
quarry site, the equipment used for construction processes, and the equipment needed for transportation processes. 


As seen in Tables 7, 8, and 9, these illustrate the heavy equipment requirements for Case A, calculated based on 
the work's productivity targets, along with an estimate of fuel consumption for each location. 


Talig T: Sure Requirements & Fuel B at the Quarry Location (Case A) 


: ' ae oe : Consumption/: Work ` Project : Fuel Consumption Total 

Equipment : Type : Unit : Te ` Hour/ ' Hour ' : Duration > . (Liter/ : Consumption | 

: : YP | Machine: : (Days) :Cliterh) : Gay): PASS 
Excavator 01 : Operational Weight 20T : 3 : Diesel : 14 : 8 i 730 : 42 : 336 : 245,280 
> Excavator 02 : Operational Weight 30T : 1 : Diesel : 20 8 730 > 2 : 160 : 116,800 
: Excavator 03 : Operational Weight 50T : 1 : Diesel : 40 8 730 > 40 : 32% : 233,600 
Excavator 04 : Breaker 6 : Diesel : 25 8 730 A 150 : 1200 : 876,000 


TOTAL CONSUMPTION (Liter) ' 1,471,680 


aus 8: Equipment Requirements & Fuel Consumption at the Main Dam Location (Case A) 


: : Fuel Consumption/ : Work - Project : Fuel ee : Total 
Equipment ; Type : Unit i i i z 3 
: ; : Type : Hour/Machine : Hour : Duration (Days) : (liter/h) : (Liter/day) : Consumption ' 
' Bulldozer | | Diesel | 22 + 8 i 730 i 110 i 880 i 642,400 | 


D85SS 5 
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` VibroSmooth : SD110: 5 ! Diesel : 18 ogo: 730 i 90 © 0720 | 525,600 
` VibroPadfoot : SD110 > 5 | Diesel 19 = = 730 © 95 760: «554,800 
TOTAL CONSUMPTION (Liter) | 1,722,800 


Table 9: Equipment Requirements & Fuel Consumption of the Transportation (Case A) 


: : f ` Travel : Average Consumption ` p4, : : Project 

| Equipment : Type : Unit : 3 : Speed ~; T i Mileage : Trip/DT : Duration : Total 7 i 

: l : : ype ` (KMh : (liter/h) (liter/km) (Km) ' ` (Days) : Consumption | 

: DumpTruck : Capacity 30T: 72 : Diesel : 30 : 50 > L67 : 845 + 2.3 Times : 730 : 17,025,060 
TOTAL CONSUMPTION (Liter) ' 17,025,060 


Based on the data above, the total fuel consumption for Case A is 20,219,540 liters. 


Next, the calculation of heavy equipment requirements and fuel consumption on Case B, calculated based on 
productivity targets, can be seen in Tables 10, 11, and 12. 


Table 10: Equipment Requirements & Fuel Consumption at the Quarry Location (Case B) 


; ' ; | Fuel : Consumption | Work : Project l Fuel Consumption : Total 
Equipment : Type : Unit : Type : / Hour / Hour : Duration | " (Litre’ ` Consumption ' 
SPE Machine : | (Days) : Mitre: Gay) i a 
" Excavator 01: Operational Weight20T | 3 © Diesel ` 14 © g lO Bo : 42 + 336 + 245,280 
: Excavator 02 : Operational Weight 30T : 1 : Diesel : 20 = BE 730 : 20: 160: 116,800 
: Excavator 03 : Operational Weight50T : 1  : Diesel : 40 = 8 | 730 = 40 + 320: 233,600 : 


: Excavator 04 : Breaker : 4 : Diesel : 25 : 8 : 730 : 100 : 800 : 584,000 : 
TOTAL CONSUMPTION (Liter) i 1,179,680 


Table 11: Equipment Requirements & Fuel Consumption at the Main Dam Location (Case B) 


Equipment : Type : Unit : Fuel + Consumption/ | Work : Project Duration : Fuel Consumption : Total : 
quip : yP : : Type : Hour/Machine : Hour : (Days) ` (litre/h) :  (Litre/day) : Consumption : 
Bulldozer : D85SS : 4 : Diesel : 22 a: ae 730 ! 88 : 704 : 513,920 

: VibroSmooth ; SD110 ; 4 ; Diesel ; 18 z 8 730 : 72 : 576 420,480 


: VibroPadfoot : SD110 : 4 : Diesel : 19 : 8 i 730 $ 76 608 : 443,840 
TOTAL CONSUMPTION (Liter) 1 1,378,240 


Table 12: Equipment Requirements & Fuel Consumption of the Transportation (Case B) 


: : : : Travel : Average er : : Duration : 

: Equipment : Type : Unit : a : Speed : Consumption : a : pe L: Project: Pee tion: 
YPe | (KM/h) ` (litre/h) . (litre/km) ` ' (Days): sca tid 
: DumpTruck : Capacity 30T : 62 : Diesel : 30 i 50 ; 1.67 : 10.5 : 6 Times 730 z 4,752,300 


TOTAL CONSUMPTION (Liter) À 4,752,300 


Based on the data on Table 10, Table 11 and Table 12, the total fuel consumption for Case B is 7,310,220 liters. 


After calculating the total equipment and fuel needs for Case A and Case B, the next step is to convert the total 
fuel consumption into energy (TJ). This is done by multiplying the total fuel consumption requirements by the 
specific heat of diesel fuel. The results of these calculations can be seen in Table 13. 


Table 13: Conversion the Consumption of Fuel (Liter) to the Consumption of Energy (TJ) 
Energy Conversion Calculation ! CASE A i CASE B 


Consumption (Litre) : 20,219,540 : Litre: 7,310,220 Litre 
Calor Specific (TJ/Litre) > 0.000038243 : TJ/Lite :  0.000038243 :  TJ/Lite  : 
i Consumption (TJ) i 773.255868220 | TJ i 279.564743460 | TJ i 


Once all the necessary data is available, the next step is to calculate the carbon emissions generated from each 
case using the emissions formula (1). The efficiency factor of CO> for diesel type is approximately 74,100 (Kg/TJ). 
The results of the comparison can be seen in Table 14. The table shows that in Case A, there is no additional 
carbon emission from land conversion, whereas in Case B, the carbon emission is increased due to the land being 
converted from a pine forest to a quarry. Therefore, the total emissions generated from Case A and Case B are 
57,298 Ton and 23,914 Ton respectively. This results in a difference in emissions between the two cases of 
33,383.69 Ton. 


As aresult, Hutama Karya, the main contractor, has chosen Case B, which contains lower carbon emissions than 
Case A. In response, Hutama Karya will also implement a carbon recovery plan by reforesting a 20-hectare area 
at the Bendoasri and Tritik quarry sites. This initiative is taken to support the Sustainability Development Goals 
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SECTION D - ENVIRONMENTAL SUSTAINABILITY 


13 Climate Action. 


Table 14: Emission Carbon Calculation 


Emission Calculation 


| Consumption (TJ) H 773.255868220 L TJ ; 279.564743460 |! TJ ! 


; EF CO, Diesel : 74100 > (kg/TJ) | 74100 ` (kg/T)) : 
' Emission (Kg) 57,298,260 Kg 20,715,747 Kg ' 


Emission (Ton) 57,298.26 20,715.75 


Emission From Land Use Change 


Pine Mercussi to Quarry i - : Ton .: 3,198.83 : Ton 
Total Emission ' 57,298 \ i 23,914.57 ' 


9. CARBON RECOVERY PLAN & SUSTAINABLE CONSTRUCTION 


Hutama Karya, in its commitment to sustainable development, plays a significant role as a contractor in the 
construction of the Semantok Dam. As part of their responsibilities, they have implemented reforestation 
initiatives in response to the conversion of the quarry land. Hutama Karya has planted over 2,000 trees on the 
former quarry site and an additional 6,000 trees in the surrounding dam area. Additionally, they have planted 
Vetiver Grass to stabilize the slopes of the spoil bank, which is made of excavation waste material. The project 
was completed in 2022, and Reforestation efforts continue periodically even today. 


Fig. 7: Iriana Joko Widodo, The First Lady, planted date palm trees as a symbol of environmental awareness 
(Left Side), tree planting and fish hatchling activities by Central & Local Government (Right Side) 


In figure 7 (on the left side) during the inauguration of the Semantok Dam, Iriana Joko Widodo planted date palm 
trees as a symbol of environmental awareness. On the right side of Figure 7, enthusiasm for reforestation also 
received support from both the central and local governments, who have participated in tree planting and fish 
hatchling releases into the dam. These efforts are part of a recovery plan intended to replace the lost carbon 
reserves and support SDG number 15: Life on Land, as Carbon recovery plans often involve afforestation and 
reforestation initiatives, which help restore land and create wildlife habitats. 


The reforestation activities may not be enough to replace the lost carbon completely because this project still 
produced substantial amount of carbon. However, the benefits of constructing this dam are also substantial, 
contributing to sustainable infrastructure. When related to the SDGs, these benefits include SDG 2 (Zero Hunger): 
The dam enhances the planting intensity from 186% to 300%, supplies raw water at a rate of 312 liters per second, 
equivalent to potable water connections for 28,000 houses, thus driving rapid economic growth and increasing 
agricultural productivity. SDG 6 (Clean Water and Sanitation): Dams store water and are critical for providing 
clean water and sanitation facilities. They can also aid in waste management and help improve water quality. SDG 
7 (Affordable and Clean Energy): Many dams are utilized to generate hydroelectric power, a form of renewable 
energy. SDG 8 (Decent Work and Economic Growth): The dam's construction creates new job opportunities, not 
only in agriculture but also in fish farming and the tourism sector. SDG 15 (Life on Land): The construction of a 
dam often involves land use changes and can significantly impact local ecosystems and biodiversity. Mitigation 
measures, such as creating new habitats or corridors for wildlife, can help reduce these impacts. Semantok Dam 
created a more sustainable infrastructure towards healthier environment, improved human well-being, and boosted 
economic growth. 


10. CONCLUSION 


The Semantok Dam Project, identified as one of the National Strategic Projects in Indonesia, is located in the 
Semantok River Stream in Nganjuk District. It's situated 115 km west of Surabaya City, East Java. With the dam's 
total length extending to 3.1 km and a height of 31.56 m, it's proudly recognized as “The Longest Dam in Southeast 
Asia.” 
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Digital construction technologies such as engineering analysis, digital survey tools, and BIM played a significant 
role in the project, primarily in acceleration of analysis on design modifications. This acceleration of the decision- 
making process has resulted in enhanced cost and time efficiency. However, despite BIM's effectiveness as a novel 
method, it has limitations, notably in computing the impact of carbon emissions on a construction project. This 
aspect is yet to be fully optimized and requires further development. As a result, a comprehensive analytical 
calculation is necessary to precisely determine carbon emissions. 


In the initial design, rockfills served as the primary material for the dam. However, the quantity available in the 
existing quarry was insufficient. Consequently, there are two alternatives that emerged to address this problem. 
Case A proposed sourcing from a new quarry where adequate rockfills were available, but this option posed 
significant additional costs and potential health concerns for the local community. Case B suggested using the 
available materials, particularly random soil. However, this would require converting a pine forest area into a 
quarry, leading to significant land-use change. 


Data were collected for both scenarios, including the quantity of materials required according to the initial and 
revised designs, the dam project's completion schedule, and the distance between the quarry and the project site. 
This information facilitated the establishment of productivity targets. The productivity targets would, in turn, 
guide the determination of resource requirements. 


The total emissions generated from Case A and Case B were 57,298 tons and 23,914 tons, respectively, resulting 
in a difference in emissions between the two cases of 33,383.69 tons. The analysis revealed that Case B, which 
involved transforming a pine forest into a quarry, resulted in lower carbon emissions compared to Case A. 
However, this land-use change led to a loss of carbon reserves. The project contractor, Hutama Karya, addressed 
this issue through a carbon recovery plan involving reforestation. More than 6,000 trees were planted around the 
dam area and at the quarry sites, with an additional 2,000 trees planted on the former quarry land. Vetiver grass 
was also planted to strengthen the dam’s slopes. 


These ongoing recovery efforts demonstrate alignment with SDG 15 - 'Life on Land,' illustrating a commitment 
to sustainable practices. The dam's construction supports not only SDG 15 but also aligns with other SDGs. These 
include SDG 2 (Zero Hunger), SDG 6 (Clean Water and Sanitation), SDG 7 (Affordable and Clean Energy), and 
SDG 8 (Decent Work and Economic Growth). The project has thus far demonstrated substantial benefits, including 
increased planting intensity, a steady supply of raw water, creation of new job opportunities, and the provision of 
clean, affordable energy. 
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ABSTRACT: To evaluate the energy and solar potential of the building stock and address feasibility studies of 
building retrofit interventions information standards are required to ensure proper data flow from building and 
urban models to simulation environments. Energy performance data are gathered from different information 
containers and therefore the result of simulations needs to be shared in BIM/GIS environments to better address 
energy policies and decision-making processes. Solar potential and energy retrofit estimation, developed by means 
of urban models (U-BEM) are too rough to support a decision-making process, even if at a feasibility stage. On 
the opposite, strategic decisions are defined with reference to large building stocks that require a U-BEM approach. 
To increase the reliability of this kind of simulations the study proposes to integrate U-BEMS with BIM-based data 
that are aggregated and published at urban scale as average performance indicators of built systems. The 
interoperability problem is analyzed both for simulation tools that need to manage this kind of data and 
openBIM/GIS platforms that need to share performance indicators and simulation results. 


KEYWORDS: Energy potential, Solar potential, IFC, BIM, U-BEM. 


1. BACKGROUND 


The necessity to take action in the constructed environment, encompassing both individual structures and entire 
urban areas, stems from the increasing significance of enhancing comfort levels and energy efficiency. This need 
arises due to mounting environmental issues and the imperative to reduce energy consumption (Ratti et al., 2005). 
With urbanization and the increasing impact of buildings on energy demand, there is a crucial demand for accurate 
predictive models that can guide sustainable urban development (Amado & Poggi, 2012). 


Traditionally, urban energy planning and building design analyses have been based on generic models, leading to 
suboptimal energy performance and inefficient resource utilization (Lan et al., 2022). Due to this reason, it 
becomes imperative to establish precise information standards that ensure seamless data transfer from architectural 
structures to urban models, encompassing both geometric and climatic data. These standardized procedures will 
facilitate, for example, the transmission of initial solar potential assessments within simulation environments. To 
overcome all the limitations, the development of sophisticated and reliable predictive models has become 
necessary (Lobaccaro et al., 2019). Such models can provide, in a simulation environment, valuable insights into 
the solar potential of different urban areas, allowing city planners, architects, and experts to make informed 
decisions regarding energy-efficient designs, renewable energy integration, and also climate-responsive urban 
planning (Kabir et al., 2018). 


The examination of outcomes from the new solar potential estimation applications, reveals a noteworthy decrease 
in the computed potential once the feasibility of installing energy production systems like photovoltaics is assessed. 
Analyzing the solar potential of buildings allows for the swift identification of surfaces significantly impacted by 
solar irradiation, evaluating the more suitable for incorporating active solar systems. However, if this assessment 
is conducted on an overly simplistic model, assuming all surfaces possess a uniform level of adaptability, it fails 
to accurately reflect the true solar potential. 


Likewise, concerning the energy retrofit of existing buildings, it is crucial to confirm the practical feasibility of 
upgrading building envelopes and technological systems with energy-efficient technologies. This verification 
should occur within a comprehensive information model aimed at identifying transformation barriers that could 
potentially impact feasibility studies. Such a model would help reference point challenges that might arise during 
the retrofitting process and ensure a more accurate assessment of the retrofitting potential. 


The essence of the issue lies in the requirement for simulations to rely on a dependable information foundation, 
which necessitates a detailed representation of buildings. However, such detailed data is often unavailable during 
the preliminary stages of the study. Consequently, it becomes imperative to establish a system that enables the 
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enrichment of urban models with more accurate values for the subsequent calculation of the solar potential data 
and the energy retrofit. Such models must integrate irradiation conditions on architectural surfaces depending on 
their exposure and geographical location with a broader range of factors influencing energy production and 
transformation to ensure more precise and reliable simulations. 


In fact, the building sector lacks a method that makes it possible, in a relatively short period of time, to assess the 
real renovation potential of an existing building and, therefore, to identify, at a preliminary level, the optimal 
intervention strategy. The lack of information on the transformation of the existing building, on the one hand, and 
the enormous potential offered by the development of new technologies, on the other, make it necessary to identify 
a plan for evaluating the renovation potential of buildings. 


The literature analysis revealed that, to date, the only tool for formulating energy efficiency design hypotheses is 
the energy diagnosis. This is a rather time-consuming process, as well as an economic one. Hence, the need to 
formulate an expeditious methodology for assessing existing buildings. (Mazzarella & Pitera, n.d.) 


Some attempts have been made to propose a method for defining a building potential in Italy, but these go beyond 
the concept of transformability, which is intrinsically linked to the technological element, and outline intervention 
scenarios that are compatible with the valuable characteristics of a historic building, such as ENEA method. In 
turn, the intervention is ranked according to a score determined based on effectiveness, durability, compatibility, 
and cost-effectiveness (Boriani et al., 2011). Other attempts to propose a methodology to analyze buildings are 
represented by TABULA project and CRI_TRA method. TABULA focuses on the proposal of a census of building 
types and their optimization by simulating the effects of possible retrofitting interventions (Corrado et al., 2014), 
while the CRI_TRA method is very close to what is proposed in the rest of this research activity. It is configured 
as a study of the criticality and transformability exhibited by the public housing sector through the assignment of 
a numerical score to the two indices, GDC and GDT (Diana, 2017). 


2. METHODOLOGY 


To achieve the delineated results, the study proposes to integrate simplified models with "transformability 
coefficients" that considered the adaptability and potential of various building surfaces and technical elements to 
hamess solar energy effectively. By identifying surfaces that are most susceptible to solar irradiation, the 
integration of this coefficient with the simulated data offers a more precise representation of the real potential 
within the urban context. 


To establish these coefficients, a representative selection of detailed building models was analyzed. These detailed 
models serve as a basis for deriving coefficients that reflect the unique characteristics of different surface types 
and facades. The coefficients were identified inside the entire urban area using characteristics such as the age of 
the buildings and the architectural similarities. Integrating the coefficients into the subsequent simulation process 
ensures that the data are based on more realistic and specific information. 


The geographic information system (GIS) environment plays a crucial role in this methodology, enabling the 
integration of various data sources, such as climatic data from terrestrial or satellite weather stations, building 
geometries, and urban morphologies, in the calculation of the solar potential data. Through this integration, the 
GIS platform acquires data from 3D simulations that take into account solar irradiance, local weather conditions, 
and the complex interplay of sunlight with urban elements (Bahu et al., 2013). 


Moreover, the simulation outcomes do not only focus on solar potential estimation for photovoltaic installations 
but also extend to solar thermal systems. This broader perspective enables a comprehensive evaluation of the 
renewable energy potential available in the urban context, encouraging the adoption of different and integrated 
renewable energy solutions. 


By applying this methodology to the North Piovego University area in Padua, Italy, the study demonstrates its 
applicability to real urban scenarios. The study area selected for analysis covers approximately 50,000 m2 and it 
is located in northern Italy. By integrating detailed building data, geographic information, and solar simulation 
techniques, this approach provides a robust foundation for optimizing energy efficiency, promoting renewable 
energy integration, and fostering sustainable urban development. 


2.1 Source data for modeling 


Urban data and weather data are closely related to a geographical location and a determined time. Geospatial data 
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can take different forms: raster, vector, and graph data. Raster data is a gridded matrix, organized in rows and 
columns. Vector data represent information through points, lines, and polygons. Graph data are represented by 
edge and node and generally take the form of road networks (Lee & Kang, 2015). Since raster data always have a 
standard dimension, they are considered more basic than vector data, which, on the contrary, are discrete. The 
union of raster and vector data makes the geographic database. The sources of these data are manifold and in recent 
years there has been a radical change in the way the maps are created. While maps were previously only created 
by national land mapping agencies, in the 2000s, thanks to the elimination of intentional GPS degradation, a new 
way of creating maps was born (Haklay & Weber, 2008). The accuracy of GPS, introduced in all mobile devices, 
gives any citizen the possibility of entering information into maps. Numerous studies have named this phenomenon 
differently. They speak of Volunteered Geographic Information (VGI) (Goodchild, 2007), neogeography (Rana & 
Joliveau, 2009) and crowdsourcing geospatial data (Heipke, 2010). The peculiarities of this phenomenon are: 


— data are much more varied than in official cartographies because anyone can add information to the 
map; 
— data are available for any part of the world even for places subject to legal or technical restrictions. 


Main producers of information are the citizens who consciously or unconsciously add information to these 
databases. the most successful project is OpenStreetMap that up to date, counts 10.729.032 users and 
23.604.230.005 GPS points (OpenStreetMap Statistics, n.d.). Other VGI projects are Wikimapia, Map Maker, Here 
Map Creator, Map Share and Waze. 


Citizens can actively or passively add information to the map. The active mode is when people are aware of 
updating the map by participating in some campaigns aimed at updating databases. The passive way, on the other 
hand, takes place thanks to the GPS inside mobile phones, when georeferencing any post on social media, for 
example (See et al., 2017). These maps also contain so-called ‘framework data’, that is the 'most common data 
themes that users of geographic data need', which can 'typically include seven framework data themes: geodetic 
control, orthoimage, elevation, transport, hydrography, governmental units and cadastre. These data represent 
relatively static phenomena and are commonly used for administrative programs, wayfinding, geopositioning, 
geotagging, and other popular services, so they have been a traditional target of government data production" 
(Elwood et al., 2012). 


Regarding the correctness of data, citizens tend to enter or correct only the data they really know, generally related 
to the area in which they live. Many institutions that produce geospatial data have espoused the cause of 
OpenStreetMap, such as for Italy Portale Cartografico Nazionale (PCN), since 2010, makes its images available 
(Italy/PCN - OpenStreetMap Wiki, n.d.). Several studies (Borkowska & Pokonieczny, 2022), (Minaei, 2020), (Dorn 
et al., 2015) have shown that from a geometric point of view, the information contained in OpenStreetMap 
databases are quite correct, especially regarding buildings and the transport network. Another important fact 
concerns the accuracy, which increases proportionally to the urbanization of the area. Accuracy of data is one of 
the requirements also expressed in ISO 19157:2023 geographic Information - data quality. According to this 
standard, the quality of geospatial data is based on several characteristics: 


— Completeness: presence of values describing different characteristics; 

— Logical Consistency: degree of adherence to the rules of the data structure (documented and named); 

— Positional Accuracy: accuracy of the measurement between the given position and the position accepted 
as true within a reference system is. For Global Navigation Satellite System (GNSS), it is within 2-3m. 

— Thematic Quality: accuracy of quantitative attributes and the correctness of non-quantitative attributes 
and feature classifications and their relationships. 

— Temporal Quality: the quality of temporal attributes and temporal relationships of features 


2.2 Open Street Map as a reliable data source 


Due to its large diffusion, the study proposes OpenStreetMap as a reliable and scalable data source. The project 
started in 2004, so it developed right at the same time as the evolution of crowdsourcing concepts. This study takes 
all the urban data from OpenStreetMap, which can in fact export part of the whole map. The exported file can also 
be used as a base map within GIS software. The .osm format is proprietary, but it is written in XML so it is 
interoperable due to an expandable language (Behr & AGSE. 5 2012 Stuttgart., 2012). 


Although the idea is that anyone can insert new information into this map, OpenStreetMap is not included in the 
more than 70 standards of the Open Geospatial Consortium (OGC). Voluntarily OpenStreetMap does not conform 
to the standards because its purpose is not to be a standard but only to be a map containing geospatial information. 
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Despite this fact all OpenStreetMap data can be used and possibly even transformed according to the standards, in 
fact there is third-party software that allows to read .osm files and transform them into shape files. 


OpenStreetMap has its own data structure comprising three geometric features (Nodes, Ways, and Relations) plus 
an object information feature (Tags) (Vargas-Munoz et al., 2021). 

Nodes are the main element in the data structure and represent symbols. A node is a transposition of a point of 
interest on the earth. Several nodes together form a way, which can be open (polyline) or closed (polygon). An 
example of a polyline can be a road or a river, while the most illustrative polygon is a building. At least two nodes 
are needed to create a polyline, while three nodes are needed to create a polygon. It is therefore possible to relate 
several nodes to create a road, but it is also possible to relate geometric objects with information objects (tags). 


Tags consist of two items 'key-value' where the key describes the type or category, and the value is the specification 
of the key. The insertion or modification of tags is free, in fact although OpenStreetMap has a defined structure of 
tags, it leaves the user the possibility of adding different tags. 


Fig.1: Adding a tag with JOSM. 


Unfortunately, the estimates clam that less than 3% of the objects have the height key filled in and very often also 
wrong. Height is the typical datum that must be entered by the user to be correct, it cannot be taken from satellite 
orthophotos. Calculating this data could be difficult to obtain especially if the building is tall, which is why the 
study (Bshouty et al., 2020) of has created an app "OpenStreetHeight" that can calculate the height using a 
photograph. The height for the buildings, in the selected area, was not present within the .osm file but was visible 
through the OSMBuildings application. This site, written in Java, shows at the three-dimensional level the 2D map 
of OpenStreetMap According to what has been said so far, if a data is wrong or missing everyone can change and 
update it, and that is what has been done. To be able to modify the data there are several editors. There are both 
computer and telephone applications. The best known and most widely used are JOSM (Java OpenStreetMap) and 
Potlatch (Neis & Zielstra, 2014), both are computer applications. Through these applications, the map can be 
viewed and edited. Adding the data, using JOSM, is very simple just import the map and selecting the building 
polygon add the tag. 


2.3 Generation of the model 


To create the model of the case study area, the initial step involved the editing of the source OSM file. In fact, 
certain unnecessary data had to be modified or removed from the starting .osm file. The primary data was generally 
accurate, except for building heights, which were often incorrect, listed as a default "3 meters" when not available. 


To acquire accurate data, several methods were considered, also including LIDAR technology (Manni et al., 2022). 
LiDAR sensors can gather highly precise and detailed elevation data. These advanced sensors emit laser pulses 
toward the Earth's surface and measure the time it takes for the laser to return after hitting an object or the ground. 
By analyzing the return time and the wavelength of the laser, LiDAR systems can calculate the distance between 
the sensor and the target surface with exceptional accuracy. In any case, it is important to have adequate equipment 
to carry out these surveys, and for that reason, in this case, building heights were manually calculated by cross- 
referencing data like the number of floors, using JOSM to modify and delete data. Additionally, two new tags were 
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inserted: 


— Transformation coefficient: which represents the percentage of facade or building transformability, 
indicating the freedom to incorporate new installations on surfaces or interiors. 
— % glazing: which represents the total glazed area relative to the total area. 


Correct values for these new parameters will be calculated for each building and replaced together with the starting 
information. Each element in the OSM file has a unique ID number to identify it. The default value is 1 for façade 
transformability and 0 for glazing. 


A generative script developed with Grasshopper’s Urban tool generated building volumes starting from OSM 
metadata and including the urban geometry of roads and external spaces. Each building is represented as an 
extrusion of its planar geometry. The source file also contains other information, such as the number of levels, 
function type, and structure identifier, unique to each building. 


Climatic and environmental conditions are gathered by .epw file. Ladybug offers a library of data collected from 
weather stations worldwide, downloadable from their official website (EPW Map, n.d.). For the case study, data 
from the Treviso airport weather station was used due to a lack of specific data for the Padua area. The available 
data is for year 2020, and the simulations presented focus on the entire month of July 2020. 


Before conducting simulations, the presence of greenery and trees must be considered, as they significantly impact 
solar potential and act as crucial mitigating elements. Therefore, the 3D geometric model was supplemented with 
the modeling and placement of trees and green elements based on their actual arrangement, height, and leaf density, 
utilizing a site survey and satellite images for accuracy. 


2.4 Solar potential study 


Once the basic geometric model is prepared and enriched with additional information, the analysis of solar 
potential can commence using various available tools, both in GIS environments and beyond. In a comparative 
study conducted by (Giannelli et al., 2022), five main tools, including GRASS GIS, ArcGIS, SimStadt, CitySim, 
and Ladybug, were assessed by comparing their simulation data with ground truth data obtained from a weather 
station (Jakica, 2018), (Peronato et al., 2018). Among these tools, Ladybug was found to be one of the most 
accurate for solar radiation studies, sky view analysis, sunlight hour modeling, and more (Freitas et al., 2015). 
Therefore, it was chosen as the basis for the research work in this study. 


However, a notable challenge faced by all the tools, including Ladybug, is that the data obtained through 
simulations are related to simplified surfaces that do not correspond to the real conformation of buildings. 
Consequently, they may not provide reliable estimates of the exact solar potential, as factors such as surface 
conformation and the percentage of glazing can significantly impact the results. For instance, certain surfaces may 
not be suitable for the installation of new energy systems throughout the entire area (Esclapés et al., 2014). The 
issue arises as simplified models treat all surfaces uniformly, considering them flat and non-glazed facades. 


To address this problem, the study aims to introduce coefficients that can account for these specific building 
characteristics and rectify the results obtained from the simulations. By identifying and incorporating these 
coefficients into the analysis, the study seeks to obtain more realistic and reliable data with minimal additional 
complexity. This process involves coding in Grasshopper to automate the identification and application of the 
coefficients to the simulation results, streamlining the evaluation process (Assouline et al., 2017). 


2.5 Energy Potential study 


The definition of the energy retrofitting potential of a building is based on the development of a method which 
allows the cataloguing of technical elements and their subsequent differentiation through the analysis of geometric, 
aesthetic and technological factors and the relative retrofitting potential. The main objective of this research 
activity is to develop a simplified type of rapid assessment model about building technical elements, whose 
horizontality allows the extension of the experimental scheme to the entire building system to evaluate its potential 
and criticalities and to direct a specific renovation process. The formulation of a percentage score, an indicator of 
the building retrofitting potential, constitutes the tool for pre-evaluating the effectiveness of its efficiency and for 
comparing buildings belonging to the same stock to identify a design strategy that is convenient both from an 
economic and energy point of view. 


The structure of the methodology is based on the definition of the upgrading potential, which quantifies the 
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possibility of increasing the performance level of a technical element by implementing specific retrofitting 
measures. For example, the presence of geometric and aesthetic constraints could significantly limit the building 
transformability. Although the upgrading potential is directly focused on the element itself, the definition of 
retrofitting potentials introduces a preliminary assessment of possible interventions. 


Once virtual potential for improvement has been defined, it is necessary to introduce the concept of 
transformability, which indicates the readiness of a technical element or a building to accommodate a technological 
solution aimed at increasing the energy efficiency. Thus, this parameter represents a tool to moderate the value 
assumed by the potential according to the real possibility of intervention. The definition of this parameter considers 
geometric, aesthetic and technological features of each technical element. In fact, the transformability coefficient 
(T) is the result of geometric mean (Equation 1) of the geometric (Tg), aesthetic (Ta) and technological 
transformability (Tr). 


T= 1°14 'Tr (1) 


Therefore, the combination of the virtual potential and transformability rate into a single index, called effective 
potential for upgrading, guarantees the moderation of technological element potential as a function of its 
transformability. 


To define effective potential rate, this method involves a careful process of analysis of the specificities that 
distinguish each item. Both transformability and upgrading potential are subjected to the assignment of a score, a 
coefficient between 0 and 3 to each index. A coefficient of 3 corresponds to the maximum possibility of 
intervention to which a technical element can be subjected. On the other hand, a value of zero is determined by 
the impossibility of intervention: in the case of potential, zero indicates the absence of actions that increase the 
energy performance of a building component (if it is already particularly performing), while a zero transformability 
derives from the presence of constraints that inhibit any design strategy. Depending on the type of component, the 
meaning of the rates changes according to the peculiarities. 


The decision to use a classification system with a peer number of coefficients stems from the desire to reduce the 
possibility of a high percentage of components falling within the median. 


At the same time, this research work proposed the decomposition of the building system provided for by UNI 8290 
and identifies five technical elements, such as opaque vertical components and windows, opaque horizontal 
components and heating systems. In addition, the methodology included a census of these elements to determine 
their profile in terms of energy potential. The entire set of examples is the result of research into sector manuals 
and direct observation of buildings in nearby cities: in particular, the attempt to grasp the complexity and variety 
of technical elements is one of the main objectives of this compilation. However, it must be stressed that the 
definition of building archetypes is a subject that will be further developed to capture the diversity of the building 
sector. 


3. EXPERIMENTATION 


The research work, for the solar potential study, involved the analysis of two significant buildings within the study 
area, focusing on their geometric and solar aspects. The first building, primarily used for offices, had a rectangular 
floor plan with an overall height of 30m. Its facades featured a series of pilasters, creating a rhythmic pattern of 
window openings. The second building, with the university department of industrial engineering laboratories, had 
a rectangular plan with a height of 9m. Its facades were predominantly composed of windows, forming an 
overlapping grid pattern. The roof of this building had a distinctive stepped pattern. 


To analyze the solar potential of these buildings, two models were developed for each, a simplified model, and a 
more detailed one. Ladybug for Grasshopper was used for the simulations to obtain the solar potential values for 
each facade. The factors that significantly influenced the actual solar potential were the percentage of windowed 
and opaque surfaces and the conformation of surfaces susceptible to transformations. The shading factor was 
already considered by Ladybug, and the presence of elements like balconies would be accounted for in the detailed 
model. 


An equation (Equation 2) was derived to relate the real solar potential value (SP,) to the simplified solar potential 
value (SP,), the percentage of the opaque area (“o0Parea), and the transformation coefficient (Tcoerr). The simplified 
solar potential value was obtained through Ladybug simulations on the simplified model. The opaque area 
percentage was calculated from geometric analysis using software like AutoCAD. The transformation coefficient 
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was derived by using an inverse equation based on the real solar potential value from the detailed model. 
SP, = SP; ` %OParea ` Tcoeff (2) 


The data for the transformation coefficients, opaque area percentages, and glazing percentages were then included 
within the .osm file to enrich the information for each building. The JOSM software facilitated editing the .osm 
file to create new categories of parameters and associate their values with each building. For simplicity, a single 
parameter was associated with each building for the new data categories, taking the average values for each facade 
and roof to account for varying conformations. 


By applying the new coefficients to the data of the simplified solar potential, a more accurate estimation of the 
solar potential value was obtained, considering the actual building conformation. The additional data calculated 
using this methodology was inserted into the .osm file for each building using specific identification codes. The 
file was then exported in .xml format, allowing the GIS environment to read the new data along with the existing 
data in an open-source format. This approach enables the creation of more realistic territorial models, providing 
more reliable information on solar potential at a granular level. 


The energy potential research activity involves the transformation of the initial qualitative approach, linked to the 
formulation of a rehabilitation potential index, into a scientifically based methodology. To achieve this, the 
development of energy simulations made it possible to provide a numerical basis for the initial survey. In this way, 
the results obtained are crucial to the establishment of the methodology, like the definition of the rehabilitation 
potential. Specifically, the aim is to quantify the reduction in energy consumption resulting from the 
implementation of a given retrofitting measure affecting a specific component, leaving the conditions of the other 
technical elements unchanged. The experimental activity leads to a ranking of the efficiency measures, expressed 
in terms of energy demand reduction. Thus, it will be possible to relate the actual improvement potential of each 
technical element to the actual energy footprint assumed by the same. This phase is not an integral part of the 
methodology, but it is functional to validate its results for subsequent applications. 


The verification of the previously assigned virtual improvement potential rates required the development of energy 
simulations using the EC700 software supplied by Edilclima. Edilclima is one of the main Italian software 
packages for calculating energy diagnoses. The choice of this calculator is due to the high interoperability between 
the BIM modelling, carried out in Revit, and the energy simulation software. In fact, the introduction of EC770 
(the plug-in provided by the company) makes it possible to derive a large part of the input data for determining the 
energy performance of a building from the architectural model, speeding up the process. 


Moreover, the procedure has introduced a series of energy improvements, each of which is intended for one 
building component. The decision to use this specific method, which is very different from the usual practice, 
stems from the desire to be able to calculate the influence of the improvement in the efficiency of a specific 
technical element on the overall behavior. Several architectural models have been defined in the process to group 
together the building types studied during the survey phase and, therefore, to evaluate the energy savings 
percentages by carrying out a considerable number of energy simulations. Specifically, for each macro-category 
of technical element, a common design state was identified to compare the percentages obtained. 


For this purpose, it was considered appropriate to adopt an extremely simplified architectural model. A rectangular 
building with a side of 10x6 meters, developed on two levels, was implemented in Revit. The floors have the same 
layout and contain only one room. The construction characteristics are the most common in Italy. It is a load- 
bearing masonry construction, with brick floors and roofs. The energy performance is particularly poor. To 
determine the actual impact of a technical element on the overall energy savings, it was considered appropriate to 
vary the energy characteristics of the considered component, while leaving the other parts of the building organism 
unchanged. An attempt was made to reproduce the most representative cases of the previous survey. 


Four possible types of heating systems were identified: radiators, radiant floors, fan coils, full-air system. 


Two boiler variants were associated with each type, such as a traditional boiler and a condensing boiler. The energy 
retrofit intervention includes the replacement of the central heating system with a heat pump and the installation 
of a photovoltaic system with a total power of 4.5 kW, while keeping the envelope performance unchanged. The 
following results were obtained: 
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Table 1: Schematization of the reduction in energy consumption due to heating system retrofit 


Type of plant Heating generator EP pre intervention EP post intervention Energy saving 
Boiler 318,86 kWh/m7year 99,00 kWh/m’year 69% 
Radiators 
Condensing boiler 294,06 kWh/m?year 99,00 kWh/m?year 66% 
Radiant heating Boiler 302,93 kWh/m’year 99,00 kWh/m’year 67% 
Boiler 305,27 kWh/m’year 118,60 kWh/myear 61% 
Fancoil 
Condensing boiler 281,60 kWh/m?year 118,60 kWh/m?year 58% 
Boiler 319,07 kWh/m7year 67,14 kWh/m’year 79% 
Air system 
Condensing boiler 290,67 kWh/m?year 67,14 kWh/m?year 77% 


Therefore, it can be concluded that the maximum reduction in energy consumption is obtained from the efficiency 
of the air system, which is over 70%. This result confirms the value of 3 of virtual potential given to the system 
under consideration. Then, this reasoning was extended to the five technical elements to obtain a complete view 
of the thermal behavior of a typical building. 


Furthermore, the values obtained were used as a tool to calibrate the energy potential values initially assigned 
based on qualitative assumptions about the energy behavior of each component. This phase was supported by the 
analysis of the results in Excel. In fact, the formulation of a classification of the reduction in consumption 
attributable to the improvement of the different types of each technical element made it possible to place the 
corresponding potential values within four energy saving ranges. 


During the experimental stage, the method was applied to a specific case study. The building is one of ATER’s 
buildings of Padua and it is in Ferdinando Coletti Street. It is an apartment block, developed on four levels and 
divided into three different staircases. Each staircase serves a total of eight residential units, consisting of 
approximately 45 sqm, whose layout is repeated on all four levels. The size of the typical apartment is rather 
humble: it consists of a living area with a kitchen and a separate sleeping area, which is served by bathrooms. The 
building has undergone various renovations over the years, which have not ensured its good state of conservation. 
In the 1970s, a refurbishment was carried out with questionable results: the total renovation of the facilities allowed 
the construction of an internal toilet for each flat. Moreover, the energy simulation confirmed its energy-intensive 
nature. The absence of insulation and technological solutions ensures the veracity of the results. 


Established the poor energy performance of the current state, two different intervention scenery could be identified. 
Specifically, the first is related to maximizing the heating system efficiency, while the second is related to 
maximizing the insulation of the envelope to reduce dispersion. 


The first scenario concentrates almost all resources on the energy retrofitting of the heating system, followed by 
limited insulation measures. The retrofit action foresees the introduction of renewable energy sources and the 
installation of a heat pump generator and a radiant floor, after the demolition of the existing flooring with a 
consequent increase in height, but it is compatible with the functional-spatial characteristics thanks to the high 
room heights. The autonomous system configurations remain unchanged. The installation of a centralized 
photovoltaic system on the roof provides a large proportion of the energy required by the new heating plant. Once 
the retrofit intervention on the installation system had been defined, the discussion focused on evaluating the 
reduction in energy demand caused by the insulation of the envelope. Thus, it is possible to separate the 
contribution to the efficiency of envelope structures from the overall behavior. These retrofitting actions include 
the insulation of opaque components and the replacement of windows to comply with the transmission limits 
imposed by Italian regulations. 


The second scenario focuses attention on increasing the insulation of the envelope and, regarding the systems, 
replacing the existing generator with a condensing boiler and the existing heating terminals with a radiant floor to 
ensure proper operation at a lower temperature. The individual system configurations remain unchanged. 
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4. ANALYSIS AND EVALUATION OF RESULTS 


During the final stage, it becomes crucial to evaluate the obtained solar potential data to determine if there are 
significant differences between the results obtained from simplified models and detailed models, and whether the 
simplified data is adequate for the evaluation purposes, especially when considering the inclusion of a new system. 
For instance, upon analyzing the data regarding the total solar potential of some of the buildings present in the 
area, it was measured a considerable difference corresponding to a potential drop of about 25%. The discrepancies 
in the results are quite significant, emphasizing the importance of model enrichment to avoid overestimation of 
solar potential when relying solely on basic models for evaluation. The detailed models provide a more accurate 
representation of the solar potential, making it evident that the use of simplified simulations alone may not suffice 
for precise assessments and decision-making regarding the implementation of new energy systems. 


The application of the EP method to the case study made it possible to identify its main weaknesses and to optimize 
its use. In particular, the conversion of the numerical scale with values between 0 and 3 into percentages required 
various experiments and subsequent adjustments. The initial approach was to proportionally convert the numerical 
indices of transformability and upgrading potential: however, the implementation of the method highlighted that 
situations characterized by low transformability were particularly disadvantaged in the calculation of potential for 
energy redevelopment. The reason can be found in the series of multiplicative operations that lead to the definition 
of the upgrading potential: in fact, a low potential value is obtained starting from average values assumed by the 
transformability and the virtual potential. The further multiplication between the effective potential and the 
technical element weight contributes to a further reduction of the achievable energy savings, making it 
incompatible with the results of energy simulations. The first attempt to transpose the transformability and the 
virtual potential was carried out independently of the type of technical element, disregarding the results of the 
energy model. For this reason, it was necessary to completely revise the entire transformability allocation matrix, 
calibrating the percentages for each technical element. To facilitate the implementation of energy retrofitting 
measures, it was appropriate to shift the entire numerical scale to values greater than 70 per cent, thus projecting 
values in the range between 70 and 100 per cent. The agreement between the results of the fast method and the 
energy simulations was the basis for the choice of the percentage range. 


The graphs below show a comparison between the results obtained by using the methodology, the energy model 
and the projection of the energy savings obtained by the efficiency of a model type (with construction 
characteristics very closed to the case study), representing the research building. In the first scenario, the analysis 
of the deviations shows a maximum deviation of 11%, related to the heating system. 


On the other hand, in the second scenario, a maximum deviation of 24% is reported for the insulation of the vertical 
perimeter walls. The reason for this difference can be found in the proposed interventions: in fact, the walls of the 
case study have external aesthetic reliefs that are incompatible with the installation of an external insulation system. 
The insulation is therefore provided internally. On the other hand, the simplified model proposes the external 
technological solution which, for the same thickness, leads to a better energy performance thanks to the correction 
of thermal bridge. Where: 


— Arepresents the gap created between the reduction in consumption derived from the methodology and 
the energy simulations. 
— B represents the gap created between the decrease in consumption derived from the methodology and 
the projected decrease in consumption. 
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— Crepresents the gap between the reduction in consumption derived from the energy simulations and the 
projected reduction in consumption. 


Fig.2: Comparison of methodology results of heat pump systems (scenario 1) and condensing boiler (scenario 
2) 


5. CONCLUSION AND FUTURE WORK 


The innovative aspect of this process revolves around information-related elements, as modern tools process 
geospatial data that can't be modified based on typological criteria to enhance analysis results. The provided 
proposal delineates the methods to attain these objectives, approached from two key angles. Firstly, from a 
disciplinary standpoint, it involves the establishment of transformation coefficients. Secondly, from an 
informational perspective, it revolves around fostering the interoperability of solar potential reduction factors. 


To apply this methodology to various building contexts and urban environments, it becomes crucial to categorize 
different case studies of typical facades and roofs, examining the similarity between buildings. This allows for the 
creation of a set of moderating coefficients and parameters associated with the urban environment, considering the 
structure's form and glazing percentage. For scalability, the process of creating a solar map at the neighborhood 
level should be as automated as possible to establish urban environment parameterization standards. 


One limitation of this approach is that when selecting corrective coefficients on a large scale, it requires more time 
for context analysis and relies on source materials such as aerial photographs or files that identify the differentiation 
between glazed and opaque surfaces, which may be challenging to obtain. Moreover, it would be interesting to 
extend the study further by assessing how much energy could be generated from various PV systems to determine 
the percentage of energy needs that these systems could cover for the buildings in the area. 


The study was conducted in a small area of Padua, but with the necessary adaptations, the data could be replicated 
across the entire city map. This would enable the integration of the findings into the OpenStreetMap project, 
expanding the database with valuable information on Solar Potential for the entire city. It is evident that the results 
show that the methodology is validated by the energy simulations by analyzing energy potential method. In fact, 
the percentage deviation of the values is around 10%, a percentage that can be considered acceptable in a 
preliminary assessment. Therefore, the application of the method allows a quick screening within a building stock, 
identifying the buildings responsible for greater energy savings. 


Specifically, it was possible to highlight that heating systems constitutes the most decisive factor in the process of 
formulating the design hypotheses: thus, it is not possible to disregard the evaluation of plant transformability that, 
in the first instance, governs the declination of the retrofit intervention. This assumption derives from the fact that, 
with the same envelope, the replacement of a traditional generator with a heat pump, the installation of a radiant 
floor system and a photovoltaic system cause a decrease in energy consumption of approximately 70%. The 
discriminating factor is the possibility of installing renewable energy production systems, which make the thermal 
behavior of the envelope take second place since the energy produced is largely free. Therefore, it can be said that 
maximizing plant efficiency is the winning strategy. Where systems are characterized by good performance, it is 
more convenient to intervene on the envelope insulation. 
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ABSTRACT: The construction industry consumes a large amount of raw materials and produces large amounts 
of carbon dioxide emissions. However, studies have shown that philosophies alone are not efficient in solving 
problems in the construction industry. They must be supported by new tools and methodologies. Therefore, this 
study aimed to achieve a more sustainable building field by integrating BIM technology and value engineering 
principles in the management of building materials. to achieve the highest possible consumption of environmental 
resources and materials through value engineering. The methodology employed in this study was to develop a 
material waste management system for construction projects. Starting in the early design phase, develop a 
decision-making process for selecting the optimum floor tile size according to room dimensions. Some materials, 
such as floor tiles, wooden panels, and marble, can be used more efficiently using BIM and scheduling tools. Floor 
tiles are essential finishing materials in the AEC industry. The initial findings outline the benefits that can be 
obtained by using BIM tools to achieve waste minimization through value engineering principles by creating an 
automation process to choose the best floor tile size according to the space width and length and minimize the 
percentage of cut tiles to the total number of tiles that are used in the space. This provides a game-changing 
solution for construction stakeholders. 


KEYWORDS: Building Information Modeling, Value Engineering, Sustainable Construction, Material 
Management, Construction Site Management, Architectural Engineering and Construction. 


1. INTRODUCTION 


The construction industry is a critical sector in terms of economic sustainability. This enhances economic growth 
because it affects other economic areas. Appropriate building material selection and recommended construction 
details significantly affect the project cost. Moreover, their consumption value is approximately 40% of a project’s 
total cost [3]. The designer ensures that the materials used in the proposed design are chosen accurately. 


Floor tiles are major building materials widely used in the Architectural, Engineering, and Construction (AEC) 
industry. Moreover, it is used in every project with different materials and sizes. is also an essential material in 
architectural decoration, and its annual consumption worldwide has reached billions of square meters. For example, 
ceramic tiles, a type of floor tile, require high-temperature firing in factories to produce them, resulting in high 
energy consumption and significant pollutant emissions., thereby posing a serious threat to human health. The 
annual consumption of ceramic tiles reached 13 billion square meters by 2020, and more than half were used as 
floor tiles [6,7]. Improving the efficiency of floor tile application plays a critical role in promoting sustainable 
development in the AEC industry. Previous studies have shown that improving accuracy, effectiveness, and 
comprehensiveness may be an effective way to improve project benefits from the design perspective [6]. 


Compared with refined design, the waste rate difference of building materials caused by different design 
approaches can be as high as 41% in general architectural construction, and the difference in construction labor 
resource waste (e.g., rework) also shows a positive correlation [8]. 


In the architectural project, the architects chose the floor tiles according to the color and design, regardless of the 
size of the tiles and the wastage of cut tiles. Taking into consideration that Most tile producers and suppliers have 
different sizes for tiles of the same design and color. In theory, layout design requires architects to accurately plan 
the laying and cutting of materials. The design should include uncut and cut tiles and provide accurate graphics 
and figures for the following steps to achieve lean material management [9]. 
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Therefore, the main objectives of this study were as follows: 


To optimize the waste of floor tiles in construction projects from the early design phase. 

Choose the best flooring tile size for the room dimensions to minimize the waste ratio of the flooring tiles. 
Create a practical method for selecting the optimum floor tiles. 

To reduce the time required for technical office engineers in the takeoff process. 


The optimization process is performed through the integration of the Value Engineering (VE) principles with 
Building Information Management (BIM) as a tool to input the data and the dimension of the space as parameters, 
input the different optional sizes of floor tiles, and apply the VE principles and equations. 


1.1 BIM and VE integration approach 


A project's success and higher market value (fulfilling the owner's specifications) depend on controlling the 
construction schedule and costs. To reduce the overall costs, stakeholders have increasingly been used in 
construction projects. The early project phase offers a great opportunity to use BIM to streamline VE. A 
bibliometric analysis was performed by Baarimah et al. in 2022 to determine the benefits of combining BIM and 
VE. The findings demonstrate that VE and BIM support rising prominence as mainstream subjects related to the 
building industry and decision-making around cost-earned value. The evaluation of generated alternatives using 
predetermined criteria is the most important stage in VE applications. Stakeholders can use multi-attribute criteria 
to integrate created models by designing an automated method to assess and contrast these options. 


1.2 Value Engineering (VE). 


Value Engineering (VE) is a proven management approach in the (AEC) industry that is used to improve the 
functioning of projects and eliminate unnecessary costs. Because the construction industry has faced various 
challenges in reaching a project's high value on time and within budget, VE has been applied in numerous countries 
around the world for half a century [9]. VE has become an integral part of the development of many projects' 
development [10]. Surveys have reported that VE can save as much as 5—10% of construction project costs [11]. 


The VE study procedure called the VE job plan, is a systematic problem-solving technique comprising the 
following phases: information, function, creativity, evaluation, development, and implementation. Among these 
phases, the creativity phase, followed by function analysis, is the most crucial for generating innovative ideas that 
require existing information and experiential knowledge from past VE projects [12]. 


Value = Function + Quality/ Cost 
Where: 


Function = The specific work that a design/item must perform, which must be the same for all the options of floor 
tiles 


Quality = owner’s need, which is the percentage of cut tiles. 
Cost = life cycle cost of the product. Moreover, the additional cost of the wastage of the materials 


Assuming that the materials are compared, they have the same quality of manufacturing with the same design, 
color, and materials of different sizes, for example, the same ceramic tile with different sizes only. In this case, the 
tiles have the same function and quality. 


1.3 The current approaches to selectin floor tiles 


In most cases, architects choose floor tiles in the conceptual phase of the design by focusing on the type, color, 
and texture based on design principles, without paying attention to the importance of the floor tile size in the waste 
management process in the early decision-making stage of the project. Subsequently, shop drawings were drawn 
without providing the exact tile requirements. Therefore, quantity surveyor engineers estimate the exact number 
of floor tiles, including uncut and cut tiles manually from shop drawings, which requires considerable effort and 
time. 


For these procedures, it is difficult to produce a different shop drawing for each floor tile option to estimate the 
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number of floor tiles with cut and uncut tiles, and to have a waste ratio to be able to choose the best floor tile size 
for each space. 


As shown in Table 1, for two rooms with the same area and different dimensions, the default most commonly used 
ceramic floor tiles are 60 x 60 cm. There were different numbers of tiles used in the two rooms. 


e The waste ratio for Room 1 was 14% and the total number of tiles used was 30. 
e The waste ratio for Room 2 was 14% and the total number of tiles used was 40. 


As shown in Fig. 1, the flooring tile layout plan for the default selection is 60x60 cm. There is a clear-cut tile in 
the two rooms, which can be avoided by using right-floor tiles. 


Table 1: Parameters of default selection size. 


Area. Length. Width. Long Ratio. Total tiles. Uncut Tiles. Cut Tiles. Waste Rate. 
Room 1 120000cm2 400cm 300cm 1.3 35 30 5 14% 
Room 2 120000cm2 600cm 200cm 3 40 30 10 25% 


400 cm 


Room 


1 


Fig. 1: Flooring tile layout plan for the default selection 


2. RESEARCH METHODOLOGY 


The purpose of this study is to develop material waste management for construction projects. Starting in the early 
design phase, develop a decision-making process for selecting the optimum floor tile size according to room 
dimensions. 


As shown in Fig. 2, using a case study plan, this study focused on providing an automation framework according 
to the BIM model integrated with the VE job plan. The methodology focused on creating decision-making tools 
depending on the VE. The integration of the VE job plan and BIM into the framework through the optimization of 
the quantity of waste by applying the VE job plan through BIM tools to reach the optimum value by increasing 
the quality of the tile floor plan by decreasing the number of cut tiles to have as much space as possible with non- 
cut tiles, which is very visually comfortable, and decreasing the cost by choosing the optimum size of the tiles. 
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Input the BIM model parameters. Floor tile options 


Export model information. 


Create the waste ratio for the floor tiles depending on the space dimensions ratio and the floor tiles size. 
Compare the waste ratio for each option. 


Find the optimal waste ratio. 


Apply the floor tiles, which ha 


ve the best waste ratio for each space. 


Fig. 2: The flowchart of the design decision-making tool 


Yes 


As shown in Fig. 3, the integration between the VE job plan and BIM optimization framework phase inputs the 
main parameters required for the floors. 


e Function analysis: All the floors that were selected in the process must meet all the function requirements. 

e Creativity Phase: Alternative sizer selection options. 

e Evaluation phase: start to comprise the alternatives which had been chosen to meet the best quality and 
assess the risk for each option. 

e Development Phase: Finalize the cost and schedule impacts. 

e Implementation Phase: Initiate applying the optimum selection choice per quality and cost. 


VE job plan BIM model - Methodology 


Input the Bim model parameters 


Information phase Materials requrments 
Floor tile options 


> All tiles must have uniform specifications except size 
Function analysi 
to perform the same function 


Create the waste ratio for the floor tiles depending 
on the space dimensions ratio and the floor tiles size 


Creativity Phase 


Evaluation phase > Compare the waste ratio for each option. 
Development Phase —> Find the optimal waste ratio. 


; Apply the floor tiles, 
implementation Phase á 
which have the best waste ratio for each space. 


Fig. 3: The integration of the VE job plan and BIM into the optimization framework 
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3. CASE STUDY 


Starting phase: Building the BIM model and inserting the floor tile options as parameters for each room, including 
the length, width, and dimensions of each floor tile option for each space in the model. 


Fig. 4 shows a case study of two different rooms with the same area and different dimensions. 


Matenals and Finishes a 
FLOOR TILE OP.1 60x60 

FLOOR TILE OP.2 50X50 

FLOOR TILE OP.3 45X45 
Dimensions a 
Area 12.000 

4 H $ 4 © 
Not puted 

LENGTH 400.000000 

WDIH 300.000000 

Computation Heg 00 crr 
Identity Data R 
Number 1 


Fig. 4: the model instant parameter for each space 


The next step is to create a Revit schedule for the instant parameter for each space, including the area of the room, 
length, width, and floor tile options. As shown in Fig. 5. 


<FLOOR TILES OPTIONS> 
A B T D E F G 
Name Area ; LR : WR ` FLOOR TILE OP.1 ; FLOOR TILE OP.2 FLOOR TILE OP.3 


:60X60 45X45 


50X50 


Fig. 5: the model instant parameter for each space 


Then export the Bim model schedule to an Excel sheet to apply the equations for each floor tile option, including 
the total tiles, uncut tiles, cuts tiles, and waste ratio. 


The waste ratio was calculated according to the dimensions of the space and optional tiles, as shown in Fig. 6. 


A 8 (s t F " j K i ° P Q & 


TILE SIZE ROOM DIMENTIONS TILES IN X DIMENTION | TILES IN Y DOMENTION 
TOTAL UNCUT curs 


TIKES THES maes [WASTE RANG 


UNCUT TOTAL UNCUT TOTAL 


mes TES 


LT=LENGTH OF TILE | WT=WIDTH OF TILE |LR®LENGTH OF ROOM WR=WIDTH OF ROOM mes LRAT mes | WR/WT 


Fig. 6: the Excel sheet of the waste Ratio 


30% 
25% 
20% 
15% 
10% 
5% 
0% 
60X60 50X50 45X45 
mRoom1 Room 2 


Fig. 7: Chart of waste ratio. 
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The schedule shows the results of the optional floor tiles, and the schedule shows the results for each space. 


e The room 600 x 200 cm shows the following: the waste ratio of the tiling 60 x 60 cm is 25%, the waste 
ratio of the tiling, the waste ratio of the tiling 45 x 45 cm is 25.71%, and the best ratio is 0% for the 
flooring tiles 50 x 50. 

e The room 400x300cm shows the following: the waste ratio of the tiling 60x60 cm is 14.29%, the waste 
ratio of the tiling, the waste ratio of the tiling 45x45 cm is 23.81%, and the best ratio is 0% for the flooring 
tiles 50x50. 


Therefore, the optimum selection of the floor tile for the two rooms according to the design approach is 50 x 50 
cm, as shown in the chart in Fig. 7. 


The next step is importing the sheet excel into Revit using a dynamo script. The minimum waste ratio values were 
selected. Then, the selected floor tiles were applied to each space. 


4. RESULTS AND DISCUSSION 


The proposed workflow and decision-making tool for choosing floor tile size by integrating BIM techniques and 
VE principles generates the waste ratio for alternative floor-tiling sizes. 


Rooms 1 and 2 have the same area (12 m2), but different widths and heights, which is used as an example of the 
current approach to floor tiles. After applying the research methodology and selecting the minimum waste ratio of 
the floor tile options, the waste ratios for Room 1 and Room 2 were reduced from 14% to 0% and from 25% to 
0%, respectively. The total number of cut tiles and unused tiles is clarified in Table 2. 


Table 2: Comparison between the default design approach and the optimized selection 


The original design The optimized selection 


Total tiles. Uncut Tiles. Cut Tiles. Waste Rate. Total tiles. Uncut Tiles. Cut Tiles. Waste Rate. 


Room 1 35 30 3 14% 48 48 0 0% 


Room 2 40 30 10 25% 48 48 0 0% 


4 [2 | Z | 


12 m? 


Table 2: Comparison between the default design approach and the optimized selection 


As shown in Fig. 8, the floor tiling plan of the two spaces with the size of the selected tiles has a clear number of 
tiles with no cut tiles as a result of the proposed workflow, which is the optimum design of any space to have clear 
tiles with no cut tiles for multiple vectors, such as visual design-wise, sustainable to reduced waste ratio, and fast 
application. 


Therefore, this design approach can be applied at different project scales with a large number of rooms to select 
the optimal floor tile size for each room and calculate the total waste ratio for all rooms that require the same floor 
tiles. Reducing the overall waste of flooring materials in the project eliminates the time required to apply value 
engineering principles in the project. 
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Selecting floor tiles in the AEC industry is a labor-intensive and time-consuming task. Architects often face 
difficulties in accurately creating shop drawings for floor tiles, owing to a lack of appropriate design tools. 
Consequently, they struggle to provide design support for subsequent stages, such as procurement and construction. 
This challenge becomes even more complex when architects need to incorporate waste reduction into their layout 
design. Consequently, the planning and cutting of floor tiles are typically performed extensively rather than 
precisely managed. This reliance on experience, rather than accurate calculations, leads to unnecessary material 
and labor wastage. To address this issue, we developed a workflow for generating accurate and comprehensive 
material waste rates. 


The value of the floor tiles after Implementation the bim 


process 

2 
1,8 
1,6 
1,4 
1,2 

1 
0,8 
0,6 
0,4 
0,2 p 

0 

60x60 50x50 45x45 


E function mquality mcost mvalue=(function+quality)/cost 


Fig. 9. The Result of the value engineering according to the integration of the Bim framework 


5. CONCLUSIONS 


This research proposes a workflow for selecting the optimized floor-tile size according to space percentage. The 
work limitation is on spaces with rectangular shapes with perpendicular angles, which is the most applicable space 
for material optimization in the VE process. All tiles must have uniform specifications except for size. The 
automation equation could be updated in future studies for application to regular spaces that are confirmed to have 
more than one rectangular shape by dividing the spaces into smaller rectangles. 


The workflow integrates BIM and VE equations, enabling architects in the early decision-making phase of the 
project to automatically calculate the waste ratio of each floor tile size option by inserting the optional sizes, 
outputting the floor selection by the minimum waste ratio for each space individually, which significantly reduces 
the material waste, minimizing the time wastage of the quantity surveyor engineer surveying the quantity of tiles 
manually from the shop drawings, and calculating the exact number of uncut and cut tiles to enable the procurement 
engineer to order the correct amount of flooring tiles. This methodology is a step in waste management research 
to reduce the material-waste ratio and help technical office engineers to enhance the process of selecting and using 
flooring materials. 


To enhance the entire tile design process, researchers proposed a workflow in the optimization process of floor tile 
planning to automatically generate the layout design of floor tiles, including uncut and cut tiles, after choosing the 
best tile size. [28]. Additionally, the two studies can be combined to form an integrated optimization process for 
the best tile planning. 
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ABSTRACT: As climate change intensifies, we must embrace renewable solutions like solar energy to combat 
greenhouse gas emissions. Harnessing the sun's power, solar energy provides a limitless and eco-friendly source 
of electricity, reducing our reliance on fossil fuels. Rooftops offer prime real estate for solar panel installation, 
optimizing sun exposure, and maximizing clean energy generation at the point of use. For installing solar panels, 
inspecting the suitability of building rooftops is essential because faulty roof structures or obstructions can cause 
a significant reduction in power generation. Computer vision-based methods proved helpful in such inspections in 
large urban areas. However, previous studies mainly focused on image-based checking, which limits their usability 
in 3D applications such as roof slope inspection and building height determination required for proper solar panel 
installation. This study proposes a GIS-integrated urban point cloud segmentation method to overcome these 
challenges. Specifically, given a point cloud of a metropolitan area, first, it is localized in the GIS map. Then a 
deep-learning-based point cloud classification model is trained to detect buildings and rooftops. Finally, a rule- 
based checking determines the building height, roof slopes, and their appropriateness for solar panel installation. 
While testing at the National Taiwan University campus, the proposed method demonstrates its efficacy in 
assessing urban rooftops for solar panel installation. 


KEYWORDS: Sustainable campus, renewable energy, point cloud segmentation, deep learning 


1. INTRODUCTION 


One of the most critical and urgent challenges we face in this century is climate change, resulting mainly from 
human mass consumption of fossil fuels. Thus, replacing fossil fuels-based energy with renewable energy is a key 
solution to this problem (IPCC, 2022). Increasing the supply and usage of renewable energy relies on not only 
efforts by power producers and public sectors but also energy-heavy industries and private sectors. Hence, in recent 
years, many major corporations worldwide have announced their targets for decreasing carbon emissions and 
increasing renewable energy usage to fulfill their corporate social responsibility (CSR), and higher education 
institutions are no exception. Many universities worldwide have also announced their climate targets and planned 
to increase the usage and supply of renewable energy for their university's social responsibility (USR) (THE, 2022). 
National Taiwan University (NTU) also announced its carbon-neutral target and pathway in Nov 2021. To achieve 
this goal, the decrease in building energy usage and the increase in renewable energy are two key strategies. For 
the latter, how to install more solar panels and smart energy systems on campus is a key question to be explored. 


To increase renewable energy supply, using spare spaces on building rooftops for solar panel installation is 
common in cities worldwide. Various factors affect the effectiveness of solar panel installation on the rooftops, 
including roof angles, shades created by nearby buildings or penthouses, and obstacles on the rooftops (Lin et al., 
2022). To evaluate the potential of solar panel installation effectively, the collection and creation of digital data 
and models of study objects become critical (Sierra et al., 2022, Chen et al., 2023). Airborne laser scanning is a 
common practice to capture the basic outline of study objects (Wang et al., 2018). After segmenting the collected 
point cloud data via deep learning models, large-scale building reconstruction and automated extraction of building 
instances can be easily achieved (Huang et al. 2022, Feng et al. 2022). 


This paper aims to analyze buildings' rooftops for solar panel installation through point cloud segmentation, taking 
National Taiwan University (NTU) in Taipei, Taiwan, as a study case. As the first university established in Taiwan, 
more than 100 buildings are on the main campus, built between the 1920s and the present. Large-scale building 


Referee List (DOI: 10.36253/fup_referee_list) 
FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup_best_practice) 
Aritra Pal, Yun-Tsui Chang, Chien-Wen Chen, Chen-Hung Wu, Pavan Kumar, Shang-Hsien Hsieh, Building Rooftop Analysis for Solar Panel 
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point cloud is collected using airborne LiDAR, and a commercial GIS tool, ArcGIS Pro, is used for further analysis. 
The analysis process and results are presented in sections 2 and 3, along with the challenges encountered. The 
research outcome can be a good reference for other university campuses which aims to use similar dataset for 
similar analysis. The main contribution of the paper is as follows: 


e — It proposes an end-to-end workflow for building rooftop analysis using point cloud data. 
e It also proposes a simple and fast methodology for point cloud segmentation using commercial software 
tools such as ArcGIS Pro. 


2. RELATED STUDIES 
2.1 Point Cloud Classification for Building Rooftop Analysis 


The application of point cloud data obtained from advanced technologies such as LiDAR has significantly 
transformed geospatial analysis (Dawood et al., 2017). This innovative approach has garnered the attention of 
researchers tapping into these datasets to extract valuable insights about urban landscapes and, more specifically, 
to evaluate the feasibility of deploying solar panels on building rooftops (Stack & Narine, 2022). A focal point of 
this effort lies in utilizing point cloud classification techniques, which can discern various objects and surfaces 
within the three-dimensional environment. By harnessing these techniques, researchers can effectively identify the 
detailed contours of building structures, ascertain the orientation of roof planes, and anticipate potential obstructive 
elements (Sun et al., 2016). This approach starkly contrasts traditional two-dimensional methodologies, allowing 
for a much more exhaustive and nuanced assessment of the diverse attributes associated with rooftops. 
Consequently, this three-dimensional perspective empowers researchers and planners to make informed decisions 
regarding optimizing solar panel placements and leveraging rooftops for sustainable energy generation (Stack & 
Narine, 2022). The fusion of GIS and point cloud data has opened new avenues for geospatial analysis, enabling 
researchers to analyze urban environments in three-dimensional detail. However, previous studies hardly 
integrated point clouds and GIS to check the rooftop suitability for solar panel installation. 


2.2 Deep Learning and Machine Learning Approaches 


The amalgamation of deep learning and machine learning techniques has marked a substantial leap forward in 
enhancing the precision and effectiveness of point cloud classification (Pal & Hsieh, 2021). Prominent among 
these methodologies are convolutional neural networks (CNNs) and other sophisticated deep-learning 
architectures that have showcased exceptional prowess in deciphering intricate roof structures and discerning the 
diverse array of rooftop attributes (Yang et al., 2023). These methodologies have emerged as dynamic tools capable 
of automatically extracting valuable information from point cloud data. This encompassing capability spans 
identifying building footprints, precisely measuring roof areas, and detecting possible shading elements (Pohle- 
Frohlich. et al., 2019). The cumulative result of these advancements is a substantially elevated accuracy and depth 
in evaluating rooftops' suitability for solar panel deployment (Tan et al., 2019). As these methods continue to 
mature and evolve, their application within the domain of point cloud classification holds tremendous promise for 
facilitating increasingly refined and reliable analyses, thus paving the way for more informed and effective 
decision-making processes related to solar energy integration. 


3. METHODOLOGY 


The methodology adopted in this study is divided into four steps: data acquisition, preprocessing, classification 
and analysis, and data aggregation. Figure 1 shows a graphical representation of the proposed methodology. This 
study uses ArcGIS Pro software for GIS-integrated point cloud classification and analysis. Details of each step of 
the method are explained in the following paragraphs. 


3.1 Data acquisition 


A UAV-mounted Light Detection and Ranging (LiDAR) device collects the point cloud data. This process begins 
with mission planning, outlining flight paths and parameters to ensure comprehensive coverage. Laser pulses are 
emitted toward the ground, and the LiDAR system calculates the return time to determine distances. The collected 
data generates a point cloud containing detailed 3D coordinates of terrain, buildings, vegetation, and other features. 
Georeferencing is achieved through GPS and INS for accurate positioning. Collected point clouds are stored in 
multiple LAS files of manageable sizes. LAS format is an industry-standard file format developed and managed 
by the American Society for Photogrammetry and Remote Sensing (ASPRS). It is a widely accepted and published 
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standard for exchanging LiDAR data. 
3.2 Preprocessing 


In the preprocessing step, the point cloud data is in LAS format imported into ArcGIS Pro software, where the 
"Create LAS Dataset" tool is employed to establish a structured dataset for subsequent analysis. To accurately 
register the point cloud data with the GIS map, the map data frame should be in the same coordinate system as the 
LiDAR point cloud tile. The preprocessing involves aligning the coordinate system, performing initial 
classification, differentiating ground points, and applying quality checks. The dataset is then clipped to focus on 
the specific study area, and optional filtering and compression steps are used to enhance data quality and efficiency. 
This processed dataset is the foundation for various geospatial analyses within the ArcGIS Pro environment, 
including classification, feature extraction, and terrain modeling. 


Preprocessing Classification & Analysis 


Data Aggregation 


Generate 


Fig. 1: Overview of the proposed methodology 


3.3 Classification and analysis 


Once the preprocessing is done, point cloud classification, roof slope analysis, and building height analysis are the 
next steps. Details of these steps are described below. 


3.3.1 Point cloud classification 


The point cloud classification is conducted in three steps: (1) using the built-in LAS classification functions such 
as Classify LAS Ground and Classify LAS Building. (2) 2D shapefile-based classification such as Set LAS Class 
Codes Using Features, and (3) Classifying a point cloud with deep learning. 


First Classify LAS Ground tool is used for the identification of ground points. Ground point assignment is reserved 
exclusively for LAS points with 0, 1, or 2 class code values. If LAS files employ distinct class code values for 
unclassified or ground measurements, the Change LAS Class Codes tool can reassign them correspondingly. Next, 
Classify LAS Building tool is used to classify building rooftops with class code values 0, 1, and 6. Before rooftop 
classification, LAS data must have classified ground points. This method may not classify points representing 
walls, vertical facades, and small rooftop features like chimneys. Before building classification, point cloud noises 
are filtered using Classify LAS Noise function. 


The 2D shapefile of buildings is used to enhance the classification quality further. LAS points intersecting the 2D 
positions of the input polygons are reclassified as buildings. Built-in function Set LAS Class Codes Using Features 
is used for this purpose. The ArcGIS software uses the American Society for Photogrammetry and Remote Sensing 
(ASPRS) defined LAS classification scheme. Although this step improves the classification, wall and facade 
classification is still challenging. 


In the final step, a pre-trained deep-learning model, PointCNN, for building classification is used to improve the 
classification results further. The deep-learning model is inputted as a Deep Learning Package (*.dlpk) in the 
ArcGIS software. Using the Existing Class Code Handling parameter control over modifications in the target LAS 
point cloud was achieved. Points already correctly classified (such as grounds) are kept unchanged. 


3.3.2 Roof slope and building height analysis 


Araster file is created using elevation values stored in the LiDAR points referenced by the LAS dataset for roof 
slope and building height analysis. Subsequently, a digital elevation model (DEM) of the buildings is created. Next 
Spatial Analyst function for Slope calculation is used to identify the slope from each cell of the raster file. The 
Slope tool uses a three-by-three moving window of cells to compute the slope value. The extent of values in the 
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output depends on the measurement units employed. When utilizing degrees, the range of slope values spans from 
0 to 90. The analysis can be accelerated using GPU. 


Building heights are estimated by comparing the digital surface model (DSM) and the DEM created earlier. The 
Zonal Statistics tool is used for this purpose. A zone encompasses all regions within the input that share an identical 
value. The input for defining zones can comprise both raster and feature data types. This tool helps in arithmetic 
statistics calculations such as Mean, Majority, Maximum, Median, Minimum, Minority, Percentile, Range, 
Standard deviation, Sum, and Difference. A zonal statistics raster is generated. In this study, the zonal statistics 
raster represents the building height. Finally, the suitability of the building for rooftop solar panel installation is 
determined by rule-based checking. 


3.4 Data aggregation 


During the data aggregation phase, the analysis outcomes, encompassing critical metrics like building heights, 
mean roof slopes, and evaluations of building suitability, are initially compiled into a structured comma-separated 
value (.csv) file. This file format aids in organizing the data for seamless processing and interpretation. 
Subsequently, this collected analytical information is integrated into the university campus's existing Geographic 
Information System (GIS) shapefile. This incorporation ensures that the analysis outcomes are appropriately 
aligned with the geographic context of the campus and can be readily accessed for further exploration. Additionally, 
an additional map layer is generated to enhance the visual representation of these analysis outcomes and promote 
efficient decision-making. This newly created layer is specifically tailored to present the aggregated results 
coherently and visually engagingly. The GIS-integrated dashboard within the ArcGIS software serves as a versatile 
tool for visualizing and interacting with the outcomes, facilitating comprehensive insights and informed actions 
based on the analysis conducted. 


4. RESULTS 


The proposed methodology was tested on the expansive 115-hectare main campus of National Taiwan University 
(NTU) in the Da'an District of Taipei. This sprawling campus hosts a multitude of academic and administrative 
buildings, exceeding a count of 100. The core objective of this study is to evaluate these buildings' viability for 
installing rooftop solar panels. To collect the crucial data, an unmanned aerial vehicle (UAV)-mounted LIDAR 
device was employed, effectively capturing the intricate point cloud representation of the campus. This extensive 
point cloud dataset was systematically stored in 13 distinct parts, all adhering to the standardized LAS format. 
Commercial software ArcGIS Pro was used for hosting, processing, and analyzing the point cloud and integrated 
GIS data. A computer with Intel 17-1370P central processing unit (CPU), 64-gigabyte (GB) random access memory, 
and 32 GB Intel®Iris® Xe Graphics graphics processing unit (GPU) is used to run the software tool. 


In conjunction with this voluminous point cloud dataset, a Geographic Information System (GIS) map showcasing 
the 2D polygonal representation of campus buildings was prepared. Figure 2 visually depicts the campus's point 
cloud model and the 2D GIS map. A comprehensive LAS dataset was constructed by amalgamating all 13 LAS 
files within the ArcGIS Pro software. This amalgamation was executed in accordance with the preprocessing phase 
described within the methodology. The subsequent phase encompassed applying a three-step classification 
technique to the compiled LAS dataset, enabling the precise segmentation of building-related point cloud data. 
The initial stage involved the classification of ground points utilizing the LAS code, following which the 2D GIS 
polygons facilitated the segmentation of the building point cloud. Eventually, implementing a deep learning model 
proved instrumental in impeccably classifying and segmenting the points that accurately represented buildings. 
The outcomes of this segmentation are depicted in Figure 3, which showcases the successful classification 
outcomes for ground and building points. 


Fig. 2: Point cloud (left) and 2D GIS map (right) of NTU main campus 
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Following the successful classification of building points, a customized DEM raster specific to the campus building 
was generated. Subsequently, the slope analysis tool was employed to accurately estimate the slope on building 
rooftops. In this study, the slope measurement unit was defined in degrees, resulting in values spanning the 
spectrum from 0 to 90 degrees. The ensuing outcome, depicted on the left-hand side of Figure 4, showcases the 
slope analysis raster. Significantly, the color coding scheme holds informative value: shades of green signify 
shallower slopes denoting suitability for solar panel installation, while shades of red signify steeper slopes 
indicating unsuitability. Notably, buildings characterized by substantial rooftop obstructions are prominently 
indicated in red hues. Furthermore, the color transition observed at building edges is attributed to shifts in slope 
characteristics, rendering rooftop edges prominently delineated in a striking red color. 


Fig. 3: Point cloud classification results: ground classification (left) and building classification (right) 


Building heights were estimated through a comparison between the DSM and DEM rasters, utilizing the Zonal 
Statistics tool. The outcome of this building height estimation is rendered as a raster, distinguished by color codes 
corresponding to different heights. This representation is exhibited in the diagram on the right-hand side of Figure 
4. Observations indicate that the tool proficiently determined building heights in the majority of cases. However, 
a few instances (depicted in red) disclosed significant inaccuracies in estimations. Subsequent in-depth analysis 
established that factors such as noise within the point cloud data, substantial tree obstruction, and inherent point 
cloud incompleteness substantially influenced the tool's performance. 


Finally, the roof slope analysis and building height estimation results were exported into a .csv file, and the existing 
GIS shapefile was updated. The NTU's GIS dashboard is used to display the analysis results. It can help university 
administrators to make decisions in a more interactive way. An example of data integration for five buildings is 
shown in Table 1. The check column of the table shows the suitability of the building for rooftop solar panel 
installation. The checking result was incomplete for the civil engineering building because of the point cloud 
incompleteness. The GIS representations of these buildings are shown in Figure 5. 
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Fig. 4: Results of roof slope analysis (left) and building height estimation (right) 


Concluding the process, the roof slope analysis and building height estimation outcomes were transferred into 
a .csv file, and the pre-existing GIS shapefile was updated concurrently. The integrated results are seamlessly 
displayed through NTU's GIS dashboard, enhancing the capacity of university administrators to make decisions in 
an interactive and informed manner. This holistic approach facilitates a more dynamic decision-making process. 
Exemplifying the integration, Table 1 demonstrates data amalgamation for five specific buildings. Notably, the 
"Check" column within the table signifies the suitability of each building for rooftop solar panel installation. 
However, it's essential to underscore that the checking process remained incomplete for the civil engineering 
building due to inherent point cloud incompleteness. The visual GIS representations of these buildings are depicted 
in Figure 5. This comprehensive integration emphasizes the analytical and visual richness of the approach. 
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5. DISCUSSIONS 


Although the methodology successfully analyzed building rooftops for solar panel installation, its performance is 
affected by several factors: point cloud completeness, obstacles from tree leaves, noise in the point cloud data, etc. 
This method faces a challenge in rooftop analysis of the shorter buildings at the NTU campus because the rooftops 
of such facilities are often obstruction by tree leaves. Also, the state of these leaves is inherently unstable, subject 
to growth, pruning, and even shedding, introducing an element of uncertainty into the ongoing analysis. A potential 
solution could involve implementing filtering criteria during the initial point cloud scanning phase. This strategic 
approach would enable the retention of essential data points, subsequently alleviating the workload and refining 
the data for subsequent analysis. Also, the accuracy of the classification methods is subject to several factors, such 
as the completeness of the existing 2D GIS map data, proper alignment of the map and point cloud data, and the 
efficiency of the deep-learning model. The incompleteness of the map data and slight discrepancies between map 
data and point cloud data often leads to manual adjustments. 


Table 1: Integration of analysis results in campus GIS map 


Building Name Area. Mean slope ( ° ) Height (m) Check 
NCREE 13135.83 15.96 21.93 Great 
College of Liberal Arts 5980.44 33.23 11.96 Ok 

Dept. of Chemistry 11460.91 37.71 27.19 Ok 
Floricultural Hall 1381.30 41.26 13.04 Ok 

Dept. of Civil Engineering 9686.44 60.05 21.75 Insufficient Data 


LN | EB Le INI 


Fig. 5: Building displayed in GIS dashboard. From left: NCREE, College of Liberal Arts, Dept. of Chemistry, 
Floricultural Hall, Dept. of Civil Engineering 


6. CONCLUSION & FUTURE WORK 


In conclusion, this study presents a holistic approach to evaluating the suitability of building rooftops for solar 
panel installation, contributing to the overarching goal of mitigating climate change through renewable energy 
solutions. The pressing need to curtail greenhouse gas emissions highlights the imperative to transition away from 
fossil fuels. Integrating renewable sources, particularly solar energy, is pivotal in this attempt. Rooftops provide 
an underutilized space for solar panel deployment, offering decentralized clean energy generation. However, the 
effectiveness of such installations hinges on accurate assessments of rooftop attributes. This research introduces a 
sophisticated methodology amalgamating Geographic Information System (GIS) techniques with advanced point 
cloud segmentation methodologies. By harnessing airborne LiDAR technology and leveraging deep learning 
models, the proposed approach deftly addresses challenges such as rooftop slope analysis and building height 
determination, ensuring the accuracy and applicability of solar panel placement assessments. The study's 
application on the National Taiwan University campus confirms the practical viability of the methodology. 


As universities and organizations worldwide set ambitious carbon neutrality goals, the methods outlined herein 
provide valuable tools for optimizing renewable energy integration. Moreover, the interdisciplinary nature of this 
research, encompassing spatial analysis and environmental considerations, exemplifies the multi-faceted approach 
required to address complex challenges like climate change. We can advance our understanding of sustainable 
energy practices through such interdisciplinary endeavors and work towards a greener and more resilient future. 
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ABSTRACT: Building information modeling shows its potential in the performance driven design, where 
multiple design solutions are generated and assessed against certain design goals. This paper proposes a 
workflow for the air-conditioning system design and simulation on thermal zoning level. Thermal zoning plays a 
pivotal role in the design thinking of engineers, synthesizing load calculation, equipment sizing, and pipe/duct 
layout. However, it is often done intuitively with its effectiveness and performance unclear at the outset. To make 
it quantitative, we decompose the zoning process into two levels (control/system) of space aggregation, joining 
both semantic and numeric characteristics. For the semantic part, space functions are considered through space 
labeling, accessibility, and adjacency. Regarding the numeric part, spaces are zoned based on their thermal 
response similarities, using dynamic mode decomposition of the simulated indoor temperature. A two-level 
hierarchy of duct/pipe network is generated. It connects spaces within each control zone at the second level, and 
terminal equipment of each zone at the first level, representing typical fan-coil or variable-air-volume systems. 
For each zoning scheme, the network and configurations are serialized as Modelica scripts for co-simulation 
with EnergyPlus. The designer can evaluate different zoning schemes in terms of initial cost, energy consumption, 
and comfort level, based on the simulation result. The entire workflow is implemented in Grasshopper with self- 
coded plugins. 


KEYWORDS: BIM, Thermal zoning, Generative design, HVAC system, Performance simulation. 


1. INTRODUCTION 


Building Information Modeling (BIM) provides solutions to the design and management problems in the realm 
of architecture, engineering and construction, as a platform for data and cooperation. The information can be 
delivered or retrieved by model view definitions (MVD), BIM query languages, or application programming 
interfaces (API). However, they barely offer model transformation, which involves the insertion, addition of 
level of detail, or aggregation of objects (Fischer, 1998). In the forward design process, the model transformation 
expands the room-based model into another space view for analysis (Suter, 2022), such as energy performance or 
system design. 


Zoning is to transform the room-based model (architectural geometry) to a zone-based model, bridging BIM 
with building energy modeling (BEM), which is also a critical step in the design of an air-conditioning (AC) 
system. It is more like an “art” for so many factors to consider, such as space function restriction, space load 
profiles, convenience for ductwork, balance of performance and cost, and even user preference, both semantic 
and numeric and hard to quantify. The engineers usually solve the zoning problem intuitively by rule of thumb, 
leading to one solution. How will it affect the system performance or whether there exists an optimum zoning 
scheme remains unsolved. Back in 2001, Brahme et al. (2001) investigated the generative ducting based on a 
grid system at the initial stage. Berquist et al. (2017) continued the idea of generative design and experimentally 
piloted different zoning on several rooms. Bres et al. (2017) examined the zoning effect of the water heating 
system for residential buildings, with detailed simulation feedback from TRNSYS. With the lower cost of 
simulation, it is quite possible to automate the system design and simulation at the zoning phase. 


In system design, a thermal zone represents the spaces (part or aggregation of rooms) with heating and cooling 
requirements that are sufficiently similar so that desired conditions can be maintained throughout using a single 
sensor (denoted as control unit). However, in simulation, the thermal zone stands for the spaces that can be 
lumped together as a single air node (rephrased as a simulation unit), where the parameters of air are uniform. 
The difference is that the simulation unit is scalable, depending on the modeling target (Fig. 1). For example, it 
should be in line with the control unit when modeling buildings in operation, as recommended by the ASHRAE 
Standard. When it comes to the building massing, a shoebox model is enough to study how geometric form 
affects energy performance. Even one thermal zone for a building is acceptable in city-scale simulation and 
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planning. In the system design, the control unit is the smallest on the building scale, which considers the load 
similarity. When the building gets larger, more system units emerge with load diversity considered. For example, 
on a system zoning level, equipment can be downsized by staggering the load profiles of control units. A proper 
zoning and routing of distribution network can reduce the power of fan/pump. More energy distribution can be 
regulated on a larger scale, such as the exhibition center that has multiple plants or power stations. 
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Fig. 1: Different levels of zoning in system design and simulation 


Ideally, one should set the control unit exactly the same as the functional space. However, it may have more 
complex plumbing, ductwork and sensor/equipment installation, increasing the initial cost. There is a trade-off in 
controlling multiple spaces by a single thermostat since it does cut the cost while causing overheating/cooling. 
To manifest such an effect, the simulation unit must be smaller than the control unit, or the temperature 
difference will be evened out during the parameter lumping. Hence, this work takes the functional space as the 
atomic simulation unit and focuses on the zoning of control unit and system unit. 


It is not quite straightforward to evaluate different zoning schemes since it is concerned with the actual system 
performance and the cost of related equipment configurations. A view model of the distribution network would 
help with the cost estimation and even the system energy modeling. Bres et al. (2017) pipelined the distribution 
network generation of the water heating system for residential buildings, by finding the minimum spanning tree 
(MST) from the potential zone centroids and space boundary vertices. Medjdoub et al. (2018) developed a 
method for open space fan-coil system layout under specific restrictions. Chen et al. (2022) solved the layout of 
diffusers and ducts for open spaces with hydraulic balance considered. In related research, the distribution 
system above the thermal zone level has been rarely visited, especially for non-residential buildings. The 
problem of distribution network layouts is similar to the design of integrated circuits (Held, 2011) or indoor 
navigation networks (Fu, 2020). 


Following the thread of Bres et al. (2017), this paper details the workflow of thermal zoning and the modeling of 
air-conditioning distribution systems for office buildings, which can offer multiple zoning schemes in primary 
design with simulation feedback. After an overview of the methodology, the rest of the paper will be organized 
into four parts: space view generation, thermal response analysis and zoning, pipe/duct network generation, and 
model scripting. Each part is exemplified by the same floorplan for reference. 


2. METHODOLOGY 
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Fig. 2: The semi-automatic workflow of generative thermal zoning toward system simulation 


Fig. 2 shows the overall workflow that takes the BIM model (IFC file) as input. In the first preparation stage, the 
user needs to manually check the integrity of the information model. All spaces must be defined and labeled with 
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their function correctly. All spaces must have boundary components and the 2"-level boundary defined. To 
facilitate the testing, templates of construction type, space schedule and load settings are used. The model 
conversion from BIM to energy model implements IfcOpenShell (Visschers, 2016) and OpenStudio SDK. 


The first step includes previous work by Suter et al. (2013), where a method is described to define space views 
and transform room-based source space data into corresponding space models. Following certain view 
definitions, the space layout can be selected, aggregated, or decomposed into functional views that are relevant 
to schematic design (Suter, 2015). The pedestrian space access network is used to identify circulation areas and 
shafts for plumbing, and the functional unit view for space groups with similarities. 


The second step applies Dynamic Mode Decomposition (DMD) (Schmid, 2022) to group spaces with similar 
thermal responses, similar to the Koopman Analysis application by Georgescu (2015), only more sampler and 
cost-effective in computation. By tuning the clustering threshold, multiple thermal zone views are generated. The 
functional unit view and thermal zone views are overlaid as zoning schemes, satisfying both semantic and 
physical requirements. 


The third step implements the potential network that conceptualizes pipe/duct layout in early design, guiding the 
system distribution network routing. It does not reflect the actual design so the pipe/duct and the equipment may 
overlay in the schematic diagram. The network first joins spaces of a thermal zone and then connects each zone 
to the designated shafts. A JSON file collects the geometry and topology of the generated system, with 
information of peak load, thermostat inherited from zoning schemes. 


The fourth step parses the zoning and system into the simulation model. For system simulation, Modelica is used 
to reflect the control effect across spaces of a thermal zone, which is the temperature bias caused by sharing a 
thermostat, while EnergyPlus handles the building physics as co-simulation. The workflow is able to reflect the 
system performance by simulation outputs (energy consumption and comfort level) and quantity takeoffs 
(equipment, pipe/duct/junction). 


Apart from the space view generation (space modeling system, Suter, 2022), the workflow is implemented on the 
Grasshopper platform with the help of LadybugTools (Roudsari, 2013) and self-coded components. Modularized 
components make it easier for testing and visualization. The toolkit! is written in C# with Rhino core algorithms 
and CGAL. 


3. FUNCTIONAL UNIT VIEW 


Although thermal zoning follows clear yet implicit physical requirements, there are more explicit semantic 
restrictions to it, such as the fire compartmentation forbidding cross ventilation, or the tenant zone for separate 
energy management. Additionally, designers must consider space functions to avoid serving bathrooms and 
offices with the same duct system. The functional unit view can depict such a function isolation. 


In previous work (Suter, 2022), a data processing pipeline was proposed for defining space views using space 
ontologies and layout transformation operations. The generated space view model can help with the space layout 
analysis by offering insights int space function, accessibility, orientation, daylighting or ventilation. In the first 
step, room-based source data are extracted from BIM by IFC class filters, such as space geometry and related 
objects. The second step transforms the data to a source space layout, where spaces are labeled automatically or 
manually. Labels are assigned by default according to the IFC to space ontologies class mapping. Additional 
labels are inferred by semantic reasoning. In the third step, the source space layout is transformed into a certain 
space view, by its defined operation sequence (including filtering, selection, aggregation and update). 


The functional unit view uses the space access network to identify the cluster in terms of adjacency and function. 

The space access network originates from a spatial relation network that builds upon centroid nodes of all layout 

elements (e.g., spaces and doors), with their spatial relations as edges. Such relations include containment, 

adjacency, proximity, and partial enclosure. A door adjacent to two spaces indicates an accessible path lies in 

between, while the isolated node with no accessibility proves to be a shaft. Different levels of depth in the space 

access network tell how important a space functions as a circulation area. Typically, the main circulation space 

contains the elements accessing each functional unit, which form a node cut-set that partitions a space access 

network into multiple components. Spaces are then merged as a functional unit with classification (function label) 
inferred from the spaces within. 


! https://www.github.com/ian-quinn/tellinclam 
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Fig. 3 Sample floorplan? Fig. 4 Generated view of (a) functional unit (b) space access network 


Fig. 3 displays the floorplan used for this work, labeled with detail functions. In the example (Fig. 4-a), there are 
seven major functional units: two marked as educational units (EduU) that include classrooms and affiliated 
spaces; two marked as office units (OffU) according to their major function; two meeting units (Mt); one 
circulation unit including the main corridor, stairs and atrium. This view will act as an overlay to the thermal 
zoning, reflecting function restrictions. 


4. THERMAL ZONING VIEW 


The control needs for air-conditioning are directly reflected by the thermal responses of all spaces. Taking the 
whole floorplan as one dynamic system, each space may have different characteristics. The essence of thermal 
zoning is to apply the same control logic to several spaces sharing similar dynamic characteristics, thus 
achieving an acceptable control effect with less controller and actuator. 


Although many factors are intertwined in this complex dynamic system, such as operation schedule, internal 
loads, solar radiation, thermal resistance and thermal mass of structure components, the indoor float temperature 
is one direct externalization of such dynamics. By measuring the time series of temperature waves, we may 
identify the dynamic characteristics of each space, and aggregate them as thermal zones based on their similarity. 
Inspired by Georgescu (2015), Dynamic Mode Decomposition (DMD) is used to perform the Koopman Analysis, 
extracting the dominant modes and their properties based on the free-float temperature data. 


The general idea of the Koopman Analysis is to study the time evolution of observables under iteration of a 
nonlinear system through the Koopman operator U, which is linear but infinite dimensional (Raak, 2016). 
Considering a dynamic system x; = f (xx) evolving on a manifold M, f is a non-linear map describing how x 
evolves in discrete time. The operator U acts on the scalar function g:M — R (1), which is the selected 
observable (or the system output), then describes its evolution in a linear, infinite space, along with the system f: 


Ug:=g9ef Ug(x,) = (f(x) = g(Xk+1) (1) 


Given eigen-decomposition of U (2), we may generally express the vector function g: M — R”, in terms of 
Koopman eigenfunctions ġ; and eigenvalues A; by (3), 


9) =) by I= Y Apa 8) 
j=1 j=1 


where {v;} is a set of vector coefficients called Koopman modes of map f. The set of eigenvalues {A;} indicates 
the growth rate and frequency of each mode. The practical idea behind this is to collect a set of data, identify 
observable g of interest, and then express it in terms of Koopman modes and eigenvalues (Chen, 2012). For 
example, one may take the time series temperature of spaces as observables (data snapshots X(n x m)), and 
analyze the thermal responses of the complex system by eigenvalues and modes. 


? https://www.angelo.edu/live/news/12569-no-place-like-home 
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The DMD algorithm approximates the modes and eigenvalues by a finite data set, with a variant of the Arnoldi 
method described in Chen et al. (2012) and summarized as (Alg. 1). Assuming the dynamic f is linear with 
f(x) = Ax, then eigenvalues of A are also eigenvalues of U. Furthermore, if g(x) = x, then modes vj are the 
corresponding eigenvectors of A (Rowley et al., 2009). However, eigen-decomposition of A is hard to solve 
directly due to its large dimensions. Alg. 1 further approximates A by projecting it onto one Krylov subspace K 
expanded by K, then calculating the eigenvalues and eigenvectors with the low-rank operator. Note that in real- 
world non-linear problems, there must be a bias r representing Xm within K, it is critical to find suitable 
constant vector(s) c to minimize r in least-square sense (Alg. 1 Line 2). Then, C is regarded as an approximation 
of the action of the Koopman operator on the associated finite dimensional space K (Raak, 2016). The resulting 
empirical Ritz values and vectors (Alg. 1 Lines 7, 8) behave in precisely the same manner as the eigenvalues and 
modes v of U (1). The theoretical deduction can be found in Rowley et al. (2009). 


Algorithm 1 DMD (one variant of standard Arnoldi) 


Input snapshots of observable [£o Z1- £m); £e E R” 
Output Ritz values {Aj}, Ritz vectors {v;i} 
tK: [zo e+ Zm IE [C0 «++ Cm i]? 
2: Zm =Ke+r, rl span(zo...2m-1) > find constant c to construct £m with 
minimal residual in least-square sense 
3: c= Krz,, > one solution by observation 
t K+ =(K*K)~'K* > when K is not full rank 
5: Construct the companion matrix 
0 Ose O w 
1 0 0 c 
C: 01 0 © 
0 0 = Arasia 
6: C =TAT > one possible decomposition 
7: Output diagonal A as Ritz values {A)....,Am} 
8: Output V := KT™~! as Ritz vectors {v1.... Um} 


There are various approximation methods of A span over the algorithm spectrum, from Koopman analysis 
(accurate, complex) to DMD (coarse, simple). For example, one can implement a classical Arnoldi algorithm, or 
by other variants like QR decomposition, SVD decomposition (standard DMD), or Proney-type method (Hankel 
DMD) (Schmid, 2022). Alg. 1 applies a variant Arnoldi method that takes c = K*x,, as one solution for Line 2 
by observation. It is not unique and the result needs cross-verification. 


By simulation, a typical school building may yield year-long (8760 hours) time-series data on the free-float 
temperature, of hundreds of spaces. The space dimension of each snapshot is far less than its time dimension (n 
< m), where rank deficiency is inevitable. The standard DMD method performs poorly because its SVD process 
truncates the matrix to nX n with lots of information loss on the time dimension. While, the Arnoldi method 
gives nice accuracy because it expands the matrix to m X m without truncation. A similar method is applied to the 
air temperature analysis of a conditioned room (Boskic, 2020), with the data dimension 28 X 241, and a power 
system whose data dimension is 7 X 24~120 (Raak, 2016). 


In order to get the free-float temperature and perform the analysis, the following workflow is implemented based 
on the Grasshopper platform. It includes three steps: 1) build up the energy model by retrieving IFC information; 
2) implement the Arnoldi algorithm on the temperature series output by simulation; 3) perform hierarchical 
clustering and output multiple thermal zoning schemes. 


The building floorplan can be manually drawn or transplanted from IFC by IfcOpenShell (Visschers, 2016). The 
LadybugTools (Roudsary, 2013) helps with the energy modeling based on the geometry and function labeling, by 
calling OpenStudio SDK. EnergyPlus 9.6.0 performs the year-long simulation without any system, using 
construction and space function templates from ASHRAE Standard 189.1-200. Because EnergyPlus 9.6.0 does 
not support thermal zones with multiply-connected region, some corridors are further divided in this case. 


Fed with the temperature series, Alg. 1 yields the empirical Ritz vectors and values approximating the Koopman 
modes v and their eigenvalues A. If the complex number A falls near the unit circle, it represents a more steadily 
evolving mode. Here we name the Growth as |A|, the Norm as |v|, and the Frequency as Im(log(A))/27At. At 
stands for the sampling period which, in this case, is 1 hour. 


The dominant modes must be selected and cross-validated because the Arnoldi method generates tons of modes 
(equal to the number of time dimensions). A simple way is to identify the energy-intense frequency bands, then 
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rank all modes by their growth value in descending order, and look up for those modes with the largest norm. In 
this case, by Discrete Fourier Analysis, the frequency bands of 1/8760, 1/24, and 1/168 take up over 95% energy 
of the entire system, which corresponds to the year, day and week period, in line with the actual operation 
schedule. In this way, the dominant modes are highlighted in red in Table 1. 

5845 0.999960 261.9814 24.0553 


| 0.02 
212 -0.02 
5846 0.999960 303.0435 -24.0553 Amplitude 

5779 0.999950 150.2885 23.9970 Fig. 6 The distribution of amplitude and phase of mode 5799. 


Table 1: Modes ranked by growth 


mode growth norm 1/frequency 
8237 1.000038 9.185662 9656.458 
8238 1.000038 13.93669 -9656.458 
8418 1.000029 51.34232 168.4023 
8419 1.000029 64.30705 -168.4023 
5799 1.000026 249.9451 23.92526 
5800 1.000026 267.1617 -23.92526 
8239 0.999999 82.70703 œ 


Phase 


Within the dominant mode, each complex number in v represents the dynamic characteristics of a space. In Fig. 
6, each space is rendered by pseudo color based on the calculated amplitude (norm) and argument value as 
follows. Space aggregation can be identified visually. 


Im(v;) 


Amp = Re(v;)? + Im(v;)}, Arg = arctan Rew, 
Ll 


To make it quantified, the hierarchical clustering is implemented on the 2-dimensional data space by their 
Euclidian distance. The hierarchical clustering gradually increases the threshold of cluster distance (e.g., the 
similarity of dynamic response), and generates multiple layouts (Fig. 7) as the generative zoning process. 


a) 2 clusters b) 7 clusters c) 16 clusters 
7 regions 24 regions 37 regions 


Fig. 7: Different thermal zoning schemes based on different cluster distance. 


The final zoning scheme is the overlay of the thermal zoning view and the functional unit view, with all 
unconditioned spaces ignored. Spaces already grouped in thermal zoning may be partitioned again by the 
functional unit view. 


5. PIPE/DUCT NETWORK VIEW 


To better evaluate the effectiveness of zoning schemes, a detailed pipe/duct network view for the distribution 
system is needed. The designer may grasp a basic idea of system layout and initial cost through the view, even 
the performance of comfort control and energy consumption via automatic simulation. Similar to the space 
access network, a pipe/duct network is introduced to describe the path how cooling/heating energy is delivered to 
each space. Since the zoning process focuses on space aggregation, the terminal ductwork within each space 
remains unaltered and is consequently not taken into account. 


Bres (2017) devised a loop pattern for the water heating system generation, connecting radiators of each zone 
with pipes around the floorplan perimeter. Due to the buoyancy effect, the radiators are normally located at the 
baseboard so the water loop should avoid the circulation area. On the contrary, the air-conditioning system 
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incorporates sizable equipment and ducts, making it more cost-effective to distribute cool/warm air through the 
ceiling void, from core to perimeter. With such a tree pattern, the pipe/duct network starts from core shafts or 
mechanical rooms and then spreads outward via circulation areas to terminal spaces. 


Pipe and duct require dedicated space. Typically, the ceiling void in circulation areas is allocated for the 
installation of electric wiring, ductwork or plumbing. These circulation areas can be identified based on the 
space access graph in section 3 (Fig. 4-b). In graph theory, the degree of a node denotes the number of edges 
incident to that node. Thus, the inaccessible shafts have a degree of zero while the corridor usually has the 
largest degree. Other spaces (degree > 2), such as waiting rooms, may serve as circulation areas in their own 
functional unit. The space view in Fig. 8 can be drawn following the methods in section 3. 


Algorithm 2 Generate Skeletons 

Input rooms as polygon set R = {Po, Pi,..., Pa}. Within set Po rep- 
resents the outer boundary while others inner holes. nmin, mex 

Output list of edge set list{S} 


1: Merge any two set Ri and Rz if Rı[0] and R2[0] overlap, then replace 

R,(0] and R2[0] by the polygon of their Boolean union. 
2; Initiate list{ S$} s- 
3}: for each remaining R do 


Solve Straight Skeleton by R, build graph G(V, E) from generated 
inner bisectors, output other bisectors as edge set B 
d+ 0 


6 for each v in V do Pp! — 

7: lime(v) + “time” value of the vertex l 

8: degree(v) — the degree of vertex Ty n 

9: if degree(v) = 1 then 

10: if time(v) < nmin then 

l: remove v from V r 
12: else 


13: Find edges ¢;. e2 incident to v from B. then add their 
bisector as edge to G E 


corridor 

l4 if time(v) > d and time(v) < Nimes» then 
E functional spaces 
15: d + time(v) as Circulaton area 
16: Get offset polygon set C by Straight Skeleton algorithm from R by — $S inner bisectors $ 

inward depth of d SS offset polygon P' 
17: Break up all edges in S by intersection points with C edge removed 
18: Remove edges from S that are inside the region formed by C 
19: Merge edges in C with S, then append S to list{S} Fig. 8: Circulation areas and the prototype network 


Upon the Boolean union of circulation areas, Alg. 2 generates the prototype of potential network by the Straight 
Skeleton (SS) algorithm. The straight skeleton is defined by continuously moving the polygon edges inwards 
parallel to themselves at a constant speed. Edges may split in two (split event) or vanish (edge event) due to 
vertices collision. During this offset process, a set of lines will be traced out by the moving vertices, which is the 
straight skeleton. It represents the shape well by centerlines, making it suitable for guiding the installation of 
ductwork. However, for open space such as foyer, the pipe/duct usually walks along the wall (not across the 
space), which means the offset process should stop at a certain depth. To achieve this, Alg. 2 takes a two-step 
skeleton generation. Firstly, generate the straight skeleton from the boundary polygon P (Sugihara, 2013), and 
filter out all interior bisectors as a set $. Each vertex in it has a “time” attribute that marks how long it walks 
during the offset process. Secondly, generate the offset polygon P’ by the largest “time” value below the 
threshold Nmax (3m, for instance). Thirdly, perform a Boolean subtraction on S using P’, and then merge it with 
P’ as the final prototype S (Fig. 8). 2 X Nmax represents the typical width that distinguishes a corridor from a 
hallway. 


In terms of the actual pipe/duct layout, the resulting set of line segments has too many joints, twists and branches. 
They need further simplification by extension, pruning and alignment. For extension, a vertex generated in an 
“edge event’ needs to be connected to the mid-point of the edge from which it is collapsed (Fig. 9-b). For pruning, 
a vertex with too small “time” value below the threshold nmin (1m, for instance), as well as the edge incident to 
it, needs to be pruned (Fig. 9-c). 2 X Nmin represents the minimum width of a typical corridor. Such redundant 
“branches” are often caused by zig-zag polygon boundaries. For alignment: 1) Considering all prevalent edge 
directions (edges below length threshold d are ignored), edges in S are grouped by Quality Threshold (QT) 
clustering with threshold d. The QT clustering takes the maximum cluster diameter as input, finding clusters 
with guaranteed quality. 2) In each group, an edge can be the alignment baseline only if the total swept area by 
projecting others to it reaches the minimum. If multiple candidates exist, pick the median. The axis is a line 
segment that connects all projected vertices along that baseline. 3) The axis will pull the nearby vertices (within 
the range of d) onto itself while keeping their edge connections. The alignment process aims for the most 
simplified S, by iteratively increasing d until S intersects with any space boundary. 
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Algorithm 3 Align Skeletons 


Input distance threshold d, angle threshold @, set of edges S 
Output set of aligned edges 


1: Initiate list(aris) 
Divide S by Quality Threshold clustering into {5o, -.. 
rameter d and @ measuring the distance and angle between two edges. 


2: + Sn}, taking pa- 


Edges with length smaller than d are ignored. 


3: for each S’ in {So,...,S,} do 

4 bor + outward offset polygon by d 

5 Group edges if their bor overlap, resulting {5j,...,¢ Si} 

6 for each S” in {S),.....S%} do 

7: AFCümin — CO 

8: axis + S"(0| 

9 for each edge e; in S” do 

10: area + 0 

iW initiate list(point) 

12 for cach edge ej in S” except for e; do 

13 area + area+the area of region swept by ej projected to e; 
14 append projected endpoints to list (point) 

15: if area < areQmin then 

16: Af Câmin & area 

17: axis + new edge spanning all points in list(point) 
18: Append avis to list(axis) 

19: Build graph G(V, E) from § 


2: for each edge aris in list(aris) do 


21 for each vertex v in G(V, E) do 
22: if distance between v and axis < d then 
23: Project v to axis 


24: Output E 


QT cluster range le, €, €, ¢,} 


seline 


jaam: a. 

(a) 

(b) | (c) 

kg A 
contour y = 
R bisector timely) > n, 
inner bisector 
2 edges removed 
>| 
tima(v) <4. 


Fig. 9: (a) edge alignment on QT clustering. (b) 
extrusion of inner bisector with 1-degree vertex. (c) 
remove vertices with too small “time” value. 


In this work, the door locations serve as the entry points to wire in the main network to each space. Such 
information can be retrieved from BIM (either by IFC or gbXML). If not available (such as a shaft or part of an 
open space), the centroid of the space region will be used instead. The algorithm finds the Manhattan path 
between the entry point and the main network by the minimum cost, and then adds them to $. Additional 
penalties will be counted if the path collides with space boundaries. The line segment set S forms a graph 
G(V, E), serving as the potential network for the detailed layout of pipe/duct system. 


According to the hierarchy of zoning outlined in section 1, the terminal nodes (functional spaces) within a 
thermal zone are connected to a distribution node, representing a variable-air-volume box or air handling unit 
(AHU). Subsequently, all distribution nodes are connected to the source nodes (shaft or mech rooms). All routes 
follow the potential network given by an undirected, weighted graph G(V, E). The 2"4-level connection is a 
Steiner tree problem on terminal nodes N & V with Steiner points provided as V \ N. The 1*-level connection 
can be regarded as the Shortest Path Tree that grows from one source node to all distribution nodes. 


Algorithm 4 Get Steiner tree of subset vertices from a graph 

Input undirected, weighted graph G, terminal vertices set N 

Output Steiner tree graph T 

Create the sub graph H(V,£) by Floyd-Warshall algorithm, making 
V DN and E contains all possible path between any v € N 


Initiate set Vie 
Initiate adjacency matrix Ali. j] + e(vj. vj), ij ElV] 
: Initiate adjacency matrix afi, j] + weight(e(v;, vj)) 
for k from 1 to |V| do 

if degree(vy,) = 2 then 
7 vj, vj — neighbours of ve 


8 Ali. j] + Afk, i] U Afk, j], then Afk, i], A[k, j] 4 
9 ali, j| — afk, i] + afk, 7), then afk, i}, alk, j] + 0 
10 Append vk to Vier 


Generate minimum spanning tree T(V’, E')by Kruskal algorithm based 
on adjacency matrix ali, j] 
for each e{v;,vj) in EY do 


e(vj, vj) — Afi, j] > map edges back to H 


13 
14: V’ 4 V' U Via > bring back relay vertices 


15: Simplify T then output 


Algorithm 5 Find centroid of tree 
Input undirected, weighted tree graph T(V, £) 
Output directed, weighted tree T” 
1: Start form any vy € V. find the furthest vertex up by Dijkstra algorithm 
: Start from vp, find the furthest directed path = {up vn} by the same 
Dijkstra algorithm 
: Locate the middle point v,, on path by length 
t: VU {tm}. EU {(tm—1, tm), (tm, tm+1)} 
5: Traverse T from tm down to leaf node to build the tree 7” 


ny 


shaft 


— level network 


= 2™-level network 
potential network 
distribution equipment 


Fig. 10: A sample duct/pipe generation based on the 
potential network 
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Alg. 4 describes the steps for solving the 2"-level connection within a zone: 1) Create the sub-graph H based on 
G, containing the terminals in set N and their possible paths by modified Floyd—Warshall algorithm. 2) Map the 
graph H to H’ by removing relay nodes (degree = 2). 3) Find the MST joining all nodes in the graph H’ by 
Kruskal algorithm, then map it back to H. The resulting graph T is the Steiner tree that connects all terminals. To 
locate the terminal equipment that handles the zone air, Alg. 5 first finds the longest path P by running the 
Dijkstra furthest path algorithm twice from any node, then adds the mid-point of P as the root node of T. The 
root node minimizes the average path to leaf nodes, implying the minimal cost of distributing cooling/heating air 
to each space. 


All distribution nodes are connected to the source nodes (shaft or mechanical room) along the potential network 
G, in the 1*-level connection. They are clustered by the shortest walk to the nearest source candidate, then get 
connected by a shortest path tree rooted in that source node (Alg. 4, with Kruskal-MST replaced by Dijkstra 
shortest-path-tree). It is an edge-weighted breadth-first search that traverses a tree down to every leaf with the 
same speed. 


For simplification, the graph G takes the length of each duct/pipe segment as edge weight. The pressure loss is 
roughly proportional to the length according to the Darcy-Weisbach equation. Especially in the low-speed 
ductwork design, a constant pressure loss per unit of duct length is commonly assumed. Moreover, the algorithm 
ignores many geometric constraints, such as the collision of equipment and ducts in the ceiling void, or the 
oversized duct that cannot fit in the shaft. Since the generative zoning focuses more on space/system topology 
than the construction document, it is omitted for our current work. 


6. MODEL SCRIPTING 


To evaluate the initial cost, energy consumption, and the control effectiveness of zoning schemes at the space 
level, the co-simulation binding Modelica and EnergyPlus is an ideal choice. Spawn (of EnergyPlus) is the latest 
whole-building energy simulation engine developed by the U.S. Department of Energy, National Labs and 
industry. It reuses the envelope and daylighting modules of EnergyPlus and couples them with the AC system 
and control modules from Modelica Buildings Library (Wetter, 2020). This division of tasks optimally leverages 
EnergyPlus for efficiently solving multi-zone building physics—a task can be time-consuming for Modelica. And, 
because EnergyPlus takes thermal zone as the basic simulation unit, Modelica is needed for inspecting different 
behaviors of spaces within a thermal zone. 


Pipe/duct network 


x ee y n (47 generation 


Geometry 
Pre-processing 


o Zoning preeooeereeoeee=- 


Schemes 


Thermal Analysis fe => = «e am a» em am em en congo cm eon em sm emn 0 on om em er eo 


Thermostat position: System template: 


1, Worst space 1. Electric heater + Fan 
2. Return duct 2. Rooftop AC Unit 


3. Fan-coil recirculation 
4. VAV + Electric reheat 


VAP SS Ak 
FY OOH 
a ee o 


Model scriptin: 
tS) and cosimulaton 


Fig. 11. Workflow of model scripting in developed Grasshopper components. 


Based on the typical-day load results from EnergyPlus, the nominal airflow rate is calculated for each space. 
Then, the algorithm does the equipment and ductwork sizing to meet the nominal airflow. It applies the Equal 
Friction method to decide the diameter and the pressure loss, which mimics the decision process of engineers. 


Following the algorithms in section 2.4, the two-level distribution network, accompanied the functional spaces, 
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is first serialized in a JSON file (component GenNetwork in Fig. 11). Then, joined with the system and 
simulation configuration, the JSON is further parsed into the Modelica scripts. The user may customize different 
systems and control templates in the component SysMockup. When component ZipMO is toggled on, the 
program will call OpenModelica Shell to perform the simulation and analyze the result. 


1 Jevel network -|< 2> level network - | - control logic 


freshAir 
pAMeCT Other zone 
ui @- rooi Be LEELEE EE connect nes 
/ ra 
a 
| | wessureDroy stal 
|” p A p mSet 
» = 
T > | | mosstionsowce l {2 
(idf }— bldg../ | stad 
Te | (002. = = = = hea H 
.epw FPN fe ah 
\ s | dir H 
| thermaZone. idealyeiing’ 'ooling, 
\ 


qSet booleanToReal hysteresis 


tSet 


; ) conPID 
At F~ — 1003; = = — — d 
5 Ife {ca} fur 


junction simpleControter. PL pulse 


Fig. 12. Asample connection of Modelica model 


Fig. 12 gives an example of the generated Modelica model, demonstrated by a recirculation fan-coil system with 
the water heater and two-speed fan. Each ‘thermalZone’ component represents a functional space (altogether 43 
conditioned spaces), with specific air leakage and fresh air ventilation rate. It exchanges temperature data with 
the corresponding zone in the EnergyPlus by Spawn. The PI controller takes the temperature as input from the 
sensor mounted on the return duct, and controls the indoor temperature around 21°C during working hours 
(9:00~18:00). Above the control zone level, a water loop connects all terminal equipment. The fan/pump and 
heat exchanger are ideal models to give rough estimations. They can be replaced with actual models given 
detailed parameters, such as performance curves. 


Table 2: Simulation results of sample cases 


“Uncomfortable Unmethour Fan Power Heat | Duct _—*Pipelength Model Simulation — 
Schemes hour (h) (h) (kWh) (kWh) area (m2) (m) AHU Equations Time (s) 
Fig. 4 726 1050 30.82 3317.04 53.60 34.38 9 16857 212 
Fig. 7 (a) 1294 1970 35.66 3298.66 78.33 17.54 8 16662 249 
Fig.7(b) 122 298 33.47 3349.66 24.38 42.67 22 19392 349 
Fig. 7 (c) 101 276 34.38 3349.10 17.61 50.75 26 20172 277 


The workflow has been tested against 4 zoning schemes for 14 winter days in Chicago. There is a clear trend in 
Table 2 that the unmet hour decreases along with the increase of control zones. Normally, a control bias of +1°C 
is acceptable for comfort conditioning. Hence, the unmet hour accumulates the working hours where the 
temperature is outside the range of 20~22°C. Similarly, the uncomfortable hour is calculated for temperatures 
outside the comfort range (20~25°C). In the last case, Fig. 7 (c), 101 uncomfortable hours indicate that one space 
may get unconditioned for 10 minutes a day on average. However, if 19°C is acceptable, the uncomfortable hour 
will be zero. Given proper equipment costing spreadsheet, this workflow can offer insights into the advantages 
and disadvantages of different zoning schemes, in terms of comfort level, initial cost and future energy bills. 
Such information can facilitate the decision-making process. 


7. DISCUSSIONS 


This paper introduces a semi-automatic workflow for thermal zoning, the AC distribution system generation, and 
model scripting. Such a workflow can assist engineers in exploring various zoning schemes and system layouts, 
taking a step forward toward the generative design driven by simulation. Additionally, it incorporates two unique 
views into the space view model. One is the thermal response view of spaces (Fig. 6), enabling designers to 
quickly zone the spaces by color similarity. The other is the potential distribution network view (Fig. 10) that 
outlines the cost of delivering energy to terminal spaces. However, there are several topics not addressed in this 
paper, left for future work: 


1) The space ontology needs to be enriched, to describe more space relations such as the tenant zone and fire 
compartmentation. Different tenants must be considered during thermal zoning because their energy 
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consumption is metered separately. Also, ductwork can be a fire hazard for spreading heat between two 
compartments. 


2) The algorithm for the 1*-level network can be improved, to consider the capacity of terminal equipment and 
the hydronic balance. It is preferable to evenly distribute the cooling/heating load of zones across multiple 
risers, ideally making the riser the root node that a tree grows from. If there are outlying spaces in the 
floorplan with unique load profiles, the algorithm is better to isolate them as distinct system zones (with a 
direct expansion system for example). 


3) More system and control templates are required, for a wide range of system performance comparisons. To a 
certain degree, the current 2-level network has some universality. Without the 1‘'-level network, it can be the 
ductwork of rooftop packaged units. With 1‘-level modeled as water pipes and 2"¢-level as air ducts, 
different proportions of water and air can resemble systems from fan-coil (less or no duct) to AHU (more 
duct). When modeled as air ducts, it suits the constant or variable air-volume system. Each system has 
specific configurations for model scripting, such as 2-pipe/4-pipe fan-coil, VAV-box reheat, or outdoor air 
duct. 


4) The algorithm lacks the ability to evenly lay out the diffusers and ductwork in an open space. Nevertheless, 
when the open space has specific function zones allocated, it resembles a multi-room floorplan, with 
physical partitions replaced by air walls. Under that condition, the pipeline still works given a proper 
assumption of the airflow rate between spaces (for EnergyPlus modeling in thermal analysis and co- 
simulation). 
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GOING BEYOND ENERGY CONSUMPTION: DIGITAL TWINS FOR 
ACHIEVING SOCIO-ECOLOGICAL SUSTAINABILITY IN THE BUILT 
ENVIRONMENT 


Dragana Nikolić & Ian J. Ewart 
University of Reading, UK 


ABSTRACT: Digital twins have attracted much of the attention from the researchers and policy makers as a 
potent industry-agnostic concept to support ambitious decarbonization goals. Consequently, much of the latest 
research has focused on computational methods for building and connecting digital twins to monitor and measure 
energy consumption and resulting emissions from buildings. At the same time, it has been recognized that 
achieving a truly sustainable built environment goes beyond environmental sustainability and is much more 
complex, calling for approaches that transcend any single discipline. Initiatives such as the National Digital Twin 
in the UK and globally, begin to offer a long-term vision of interconnected, purpose-driven and outcome-focused 
digital twins, grounded in systems thinking. Such approaches recognize the economic, social and ecological layers 
as critical data components in these digital ecosystems for understanding the built environment as a whole. Yet, 
social and ecological sustainability will remain difficult to address without involving allied disciplines and those 
from the realms of sociology, ecology, or anthropology in a conversation about the critical data sitting at the 
intersections between human behavior and technological innovation. In this paper, we review and discuss the state 
of the art research on digital twins to identify the disciplines dominating the narrative in the context of a 
sustainable built environment. We unpack a techno-rationalist view that emphasizes the sole reliance on 
technology for problem-solving and argue that by going beyond energy consumption and carbon emissions, digital 
twins can facilitate a more nuanced assessment of sustainability challenges, encompassing social equity, cultural 
preservation, and ecological resilience. 


KEYWORDS: Digital twin, socio-ecology, sustainability, smart city, review. 


1. INTRODUCTION 


The alarming effects of climate change and environmental degradation have prompted various global policies to 
set ambitious targets for reducing carbon emission by 2050 (Climate Change Committee, 2019; United Nations 
Environment Programme, 2022). The urgency of climate change as well as the recent pandemic have raised many 
questions of what the future of the built environment should look like and how that future can be envisioned and 
accomplished. Carbon emission targets or achieving “net zero” have thus prompted many digital transformation 
initiatives as a way to mobilize technology and data science to monitor, simulate and evaluate possible solutions 
across sectors to meet the decarbonization goals and improve overall performance. In the built environment 
disciplines and construction specifically, one such initiative that has attracted much attention is the concept of 
digital twins as a way to connect physical and digital assets to support data-driven decision making in complex 
environments. In the UK for example, the National Digital Twin Programme (CDBB, 2019), offers a broad vision 
of connected digital twins across environmental, social and economic spheres driven by an ultimate goal of 
enabling people and systems to flourish. This shift has also challenged built environment practitioners to consider 
the long-term consequences of any interventions (Whyte et al., 2020) and has led to a greater focus on outcomes 
rather than outputs, and a broader digital context within which project data can be situated, for example in the 
context of ‘smart cities’. 


Yet, given that the global demands for energy are increasing, the pursuit of carbon emissions reduction has 
consequently focused efforts on understanding and reducing energy consumption in the contexts of infrastructure 
and building performance. However, research points out that responding to the climate challenge is far more 
complex, or a “super-wicked” problem that defies simplistic technological solutions and often prioritizes short- 
term goals with competing priorities (Levin et al., 2012; Rabeneck, 2008). Achieving a truly sustainable built 
environment is much more complex, calling for approaches that transcend any single discipline and move away 
from project-bound methodologies to those where developed models span organizational and jurisdictional units 
(Whyte et al., 2019). As Rabeneck (2008) argues, any understanding of asset performance demands a systems 
perspective to better articulate needs within a given context. Initiatives such as the National Digital Twin in the 
UK and globally, begin to offer a long-term vision of interconnected, purpose-driven and outcome-focused digital 
twins, grounded in systems thinking. Such approaches recognize the economic, social and ecological layers as 
critical data components in these digital ecosystems for understanding the built environment as a whole. Yet, social 
and ecological sustainability will remain difficult to address without involving allied disciplines and those from 
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the realms of sociology, ecology, or anthropology in a conversation about the critical data sitting at the intersections 
between human behavior and technological innovation. 


In this exploratory paper, we review recent research around large-scale digital twins for the built environment and 
argue that while promoted as a potent industry-agnostic concept to support ambitious decarbonization goals, the 
narrative has been dominated by technology-focused methods to meet such goals. We identify literature that raises 
critical considerations for informing the holistic approaches to developing long-term purpose-driven digital twins 
for a sustainable environment. We review the relevant literature to unpack a techno-rationalist view that 
emphasizes the sole reliance on technology for problem-solving and argue that by going beyond energy 
consumption and carbon emissions, digital twins can facilitate a more nuanced assessment of sustainability 
challenges, encompassing social equity, cultural preservation, and ecological resilience. We explore a set of 
underlying assumptions and considerations such as the authority of the data, system complexity and cross-sector 
boundaries, and the technology landscape and procedures to enable constructive questioning. This approach allows 
us to understand how digital twin applications are subject to dominating business cases driving their development, 
which can consequently affect the design and operation of built environment projects. Moreover, we take the view 
that it is becoming increasingly difficult to sustain the traditional compartmentalized practices, but it is becoming 
imperative to promote conversations between the allied built environment and social disciplines to avoid single- 
issue dominance that could lead to unintended consequences, furthered by partially informed policies (Whyte et 
al., 2020). This has consequences for how digital technologies are used, demanding new and different kinds of 
data and processes, providing new challenges to the construction informatics research community and to 
practitioners. 


2. THE SOCIO-TECHNICAL LANDSCAPE IN THE BUILT ENVIRONMENT 


The pervasiveness of digital technologies across architecture, engineering, and construction practices as seen 
through a convergence of material science, robotics, 3D printing, sensors, artificial intelligence, and other 
technologies, presents new digital capabilities that connect physical environments with digital ecosystems. 
Technological innovation has always been paired with urban development (Quek et al., 2023), although in recent 
years, the concept of technology has shifted inexorably towards the digital and the view that the world, and reality 
itself is no longer analogue, but is made up of a digital representation of itself (Ewart, 2018). While the proliferation 
of low-cost consumer-market technologies paired with big data and Internet of Things has offered an enticing 
world of opportunities to improve the design, delivery, and operations of physical assets, it has also raised 
questions about how to make sense of the ever-growing raw and complex data sets to understand how we use the 
built environment and make informed decisions about its future (Nikolić & Whyte, 2021). 


The concept of digital twins in the built environment practice has grown out of the recognition that the delivery of 
physical assets has become inseparable from the delivery of its digital counterpart and with a potential for an 
extensive data-capture to understand its use and improve its operation. With real-time asset data enabled, 
information received can influence future investment decisions, especially for serial clients such as governments, 
and aim to either change user behaviors or assets in new project interventions (Whyte & Nikolić, 2018). The digital 
twin idea was first introduced in 2002 in aerospace as a concept for Product Lifecycle Management (Grieves, 
2019) and its use remains predominantly in manufacturing. Recent applications in the built environment include 
smart city initiatives, structural health monitoring, infrastructure planning and management (e.g. power, water, 
transportation), agriculture, and urban planning and development. In construction, the development of digital twins 
gained traction only in the last five years (Opoku et al., 2021), though not without challenges (Opoku et al., 2023). 
Urban infrastructure and 3D city models moving beyond geometry and information have started to become 
developed around the same time although mostly by linking BIM models with data (Ferré-Bigorra et al., 2022). 


Unlike in the aerospace and manufacturing domains, digital twins for the built environment can span greater scales, 
professional domains and jurisdictional units, with an increasing complexity due to the heterogeneous data sources 
and sub-system interactions, leading to the difficulty of reliably predicting the system performance. For example, 
urban planning and management increasingly relies on understanding interactions between natural, cyber-physical 
and social systems in the form of urban digital twins (UDT) to foster human-centered resilience (Ye et al., 2023). 
Digital twins at city and urban scales can offer insight into how we use the built environment and inform the 
decisions for future interventions, yet their development is much more complex compared to DTs at building and 
component scales. Urban environments and cities are dynamic living systems that constantly evolve (Quek et al., 
2023) and any interventions in this complex system will be intricately tied to economic and social sustainability 
goals as much as environmental. Ultimately, as Grieves (2019) argues, the success of digital twins will need to 
create value for the users of the systems, generally defined through value propositions or “use cases”. 


There is a tension between the grand challenge of setting broad sustainability goals and the practical challenge of 
a system-of-systems approach necessary for addressing them. For sustainable development, some of the recent 
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reviews of digital twin applications and research (e.g. Papadonikolaki & Anumba, 2022) reveal that while holding 
a promise of a method to mitigate and adapt to environmental changes, the focus has been mostly on the 
decarbonization efforts in the energy sector and reducing energy consumption across the domains, including 
buildings. The research on design and delivery of buildings has encapsulated such efforts through increasing 
energy performance and reducing waste, although under the changing terminology of green, smart, high 
performance, carbon-neutral or net-zero buildings (Bonci et al., 2019; Gultekin et al., 2013; Korkmaz et al., 2010). 
A general survey of digital twin applications in design and construction domains, however, reflects rather an 
engineering approach to meeting decarbonization goals through improved sensing, monitoring, material, and data 
science, or predicting and simulating occupant behavior; and approach challenged by the view that the building 
performance is realized over time, rather than predetermined (Green & Sergeeva, 2020). The difficulty of such 
compartmentalized approaches and domain-specific definition of local carbon targets is that the outcomes may be 
insufficient to recognize the impact in a larger context and the system within which such interventions operate. As 
a result, most indicators developed so far have been primarily describing the state of the environment, rather than 
the relationship between society and ecosystems (Azar et al., 1996). 


The dominating narrative around net-zero carbon has prompted predominantly technology-oriented approaches to 
decarbonization, whether they refer to extending renewable energy technologies or improving the energy 
performance of buildings and infrastructure. The global quest for smart products, buildings, cities and systems has 
been met with an ever-growing and more diversified digital ecosystem of software and siloed technological 
developments, a situation that has prompted calls for the technological dimension to be included in the 
sustainability trifecta of economic, environmental and social goals guiding the urban planning and development 
(Quek et al., 2023). However, Waring & Richerson (2011) argued that such environmental challenges are in fact, 
socio-ecological in nature and therefore, designing effective responses will depend on a deeper understanding of 
the human-environmental interactions. Socio-ecological perspective emphasizes societal activities that impact the 
use of resources, rather than on environmental quality indicators with an aim to aid in planning and decision- 
making processes at various administrative levels (Azar et al., 1996). Ince (2023) further suggests that adopting a 
socio-ecological approach and systems thinking with a multidisciplinary perspective can offer new models for 
creating systemic and long-term solutions to sustainability problems. In practice, this would mean thinking and 
modeling that involves all stakeholders, performing economic and biological analyses of the environment and 
resources at micro- and macro scales, and participatory approaches to environmental policy design (Ince, 2023). 
Such perspectives invite dynamic systems thinking approaches that span spatial, temporal and organizational 
scales and considers a set of critical resources such as natural, social, economic and cultural, all located at the 
intersection of interdisciplinary collaboration, moving away from short-term narrowly focused technological fixes. 
In this context, there is somewhat of a paradox of technological optimism where technical fixes are viewed as 
solutions to all problems, even those that are non-technical in nature, while social and economic factors are viewed 
as obstacles, rather than essential to designing solutions (Rudolph, 2023). This is further exacerbated by the 
plethora of isolated pilot studies and the dominance of industry-backed funded research which is unlikely to lead 
to truly transformative socio-ecological thinking of transdisciplinary work (Rudolph, 2023). 


3. METHOD 


To explore the current narratives in research associated with socio-ecological systems thinking in the domain of 
digital twins, we conducted an initial level of a systematic review where we first identified bibliographical sources 
that focus on urban or city digital twins as we were interested in the broader scale digital twins to understand the 
system complexity. In the search, we excluded studies that were in the physical science areas, such as mathematics, 
physics, chemistry or medicine. We conducted a search of the Scopus database to sample articles and studies using 
the following sampling string: 


TITLE-ABS-KEY ( "city" OR "urban" OR "built environment" OR "smart city" AND "digital twin" ) AND 
TITLE-ABS-KEY ( "social" OR "ecolog*" OR "sustainab*" OR "net-zero" ) AND 


( LIMIT-TO ( DOCTYPE , "ar" ) OR LIMIT-TO ( DOCTYPE , "cp" ) ) AND ( LIMIT-TO ( LANGUAGE , 
"english" ) ) AND (EXCLUDE ( SUBJAREA , "phys" ) OR EXCLUDE ( SUBJAREA , "math" ) OR EXCLUDE 
( SUBJAREA , "medi" ) OR EXCLUDE ( SUBJAREA , "ceng" ) OR EXCLUDE ( SUBJAREA , "neur" ) OR 
EXCLUDE ( SUBJAREA , "chem" ) ) 


The search yielded 153 publications including journal articles (92) and conference papers (61), all published 
between 2018-2023 (Fig. 1). Publications in the area of social science are among top five, flanked by those in the 
areas of engineering, computer and environmental sciences, and energy (Fig. 2). Lastly, it was interesting to 
observe that the significant funding in this area comes from the European funding schemes, followed by the 
national science funding programs in the U.S. and China (Fig. 3). 
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CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 
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Fig. 1: Number of publications on urban and city digital twins per year 
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Fig. 3: Publications by funding sponsor 


However, the review of abstracts revealed a wide range of approaches and methods with a high degree of varying 
conceptualization of problems and definition of digital twins, leading to a considerable number of papers being 
omitted from further review. For example, papers that approached digital twin models from an engineering 
perspective or conceptualized them as a single system with no links made to either social or ecological issues were 
out of scope (e.g. water system focusing on flood risks). Similarly, papers focusing only on economic, 
technological or social aspects were omitted as well. Lastly, papers that only focused on digital models that did 
not interact with their physical counterpart were not considered to be digital twins as defined. This has led to a list 
of 25 publications that were selected for further review to identify themes, considerations, developments and 
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challenges for developing complex socio-ecologically driven digital twins (Table 1). 


Table 1: Select publications with key data. 


No 
1 


23 


24 


25 


Reference 


Yigitcanlar T. et al. 
(2019) 

Sadowski J. and 
Bendor R. (2019) 
Dembski F., et al. 
(2020) 

Goel R.K., et al. 
(2021) 

Shahat E. et al. (2021) 


Yossef B. & Aharon- 
Gutman M. (2022) 
Benedetti A.C., et al. 
(2022) 


Tzachor A., et al. 
(2022) 


Corrado C.R. et al. 
(2022) 


Charitonidou M. 
(2022) 

Ferré-Bigorra J. et al. 
(2022) 

Bozeman J.F. et al. 
(2023) 

Ye X. et al. (2023) 


Peters, D. and 
Schindler, S. (2023) 
Kumalasari D. et al. 
(2023) 

Al-Sehrawy R. et al. 
(2023) 

Masoumi H. et al. 
(2023) 

Quek H.Y. et al. 
(2023) 

Dembski F. et al. 
(2019) 


Wan L. et al. (2019) 
Mohammadi N., et al. 


(2020) 
Yue A. et al. (2022) 


Zou S. et al. (2022) 


Akimov L. et al. 
(2023) 


Cruz P. et al. (2023) 


Title 


The making of smart cities: Are Songdo, Masdar, Amsterdam, 
San Francisco and Brisbane the best we could build? 

Selling Smartness: Corporate Narratives and the Smart City as 
a Sociotechnical Imaginary 

Urban digital twins for smart cities and citizens: The case study 
of Herrenberg, Germany 

Self-sustainable smart cities: Socio-spatial society using 
participative bottom-up and cognitive top-down approach 

City Digital Twin Potentials: A Review and 

Research Agenda 

The Social Digital Twin: The Social Turn in the Field of Smart 
Cities 

The Process of Digitalization of the Urban Environment for the 
Development of Sustainable and Circular Cities: A Case Study 
of Bologna, Italy 

Potential and limitations of digital twins to achieve the 
Sustainable Development Goals 


Combining Green Metrics and Digital Twins for Sustainability 
Planning and Governance of Smart Buildings and Cities 


Urban scale digital twins in data-driven society: Challenging 
digital universalism in urban planning decision-making 
The adoption of urban digital twins 


Three research priorities for just and sustainable urban systems: 
Now is the time to refocus 

Developing Human-Centered Urban Digital Twins for 
Community Infrastructure Resilience: A Research Agenda 
FAIR for digital twins 


Planning Walkable Cities: Generative Design Approach 
towards Digital Twin Implementation 

The pluralism of digital twins for urban management: Bridging 
theory and practice 

City Digital Twins: their maturity level and differentiation from 
3D city models 

The conundrum in smart city governance: Interoperability and 
compatibility in an ever-growing ecosystem of digital twins 
The Digital Twin Tackling Urban Challenges with Models, 
Spatial Analysis and Numerical Simulations in Immersive 
Virtual Environments 

Developing a city-level digital twin - Propositions and a case 
study 

Knowledge discovery in smart city digital twins 


Smart Governance of Urban Ecological Environment Driven by 
Digital Twin Technology: A Case Study on the Ecological 
Restoration and Management in S island of Chongqing 

A Preliminary Study on the Development and Application of 
Digital Twin Landscape Architectures in the Context of Smart 
City 

The Environmentally-Efficient 
Respecting Urban Context 


Towards e-Cities: An Atlas to Enhance the Public Realm 
Through Interactive Urban Cyber-Physical Devices 


Canal District Design 


4. EMERGING THEMES AND THE DISCUSSION 


Type 
Article 


Article 


Focus 


Multidimensional 
framework 
Counter-narrative of 
technology salvation 
Practical use of UDT and 
part. engagement 
behavioral intellig., trans- 
disc knowledge 

incl. of socio-econ. 
components 

Complexity theory 


Predictive tool for urban 
planning 


modeling socio-technical 
and socio-ecological 
systems 

metric-driven framework 
for sustainability planning 
of a sociotechnical system 
socio-tech. perspective of 
smart cities 

limitations of city digital 
twins 

social equity and justice, 
circularity, and DTs 
human-centered UDTs 
framework 

sustainable data landscape 


human perspective in 
scenario development 

DT inconsistencies and 
poorly measured priorities 
going beyond 3D viz. an, 
monitoring 

systems and semantic 
integration 

civic engagement 

in urban planning 


theory and policy 
experimentation 
spatiotemporal knowledge 
discovery framework 
Urban restoration 


digital twin landscape 
architecture 


landscape restoration 


heterogeneous urban cyber- 
physical projects case 
studies 


The select list of publications demonstrates that the research focusing on large scale digital twins is still in early 
stages and some of the relevant discussions and debates remain largely embedded within the “smart city” literature. 
From the select list of publications, we sought to identify the application areas, as well as the indicators of the 
systems thinking that extend the environmental performance. In doing so, our goal was to establish the extent of 
socio-ecological and transdisciplinary thinking informing the development of large-scale digital twins and the 
potential obstacles for their implementation. 
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4.1 Urban digital twins and smart cities 


Digital twins at urban scales have been tightly coupled with the smart city narratives where the focus has been 
largely on modeling specific infrastructure needs that include forecasting and preventing of floods, increasing the 
efficiency of power grids, understanding of commuting patterns for transportation, as well as modeling and 
prevention of epidemics in the public health domain. From the literature, such digital twins have been variably 
termed city digital twins (CDT), urban digital twins (UDT), or social urban digital twins (SUDT). The greatest 
challenge, however, is determining how closely the digital twin should be coupled with the real urban environment 
and whether the abstraction and simplification of social or economic datasets could even qualify such models as 
digital twins (Ye et al., 2023). While the promise and the potential for city digital twins to not only mirror and 
interact with the physical counterpart, but also account for social and economic aspects (Wan et al., 2019), fewer 
studies elaborate on the complexity of developing such models or describe the interactions and dependencies 
between the heterogeneous data sets spanning spatial and temporal description of environmental, social, and 
economic factors (Savage et al., 2022). Although digital twins of cities have been developed, it is difficult to 
discern with consistency what systems have been modeled in each implementation and to what extent, further 
confusing the understanding of urban or city digital twins (Ferré-Bigorra et al., 2022). 


What has also become apparent from the review is that the discussion of city, or urban digital twins is tightly 
coupled with the smart city narratives. The relationship between city digital twins and smart cities is not yet clear, 
although the smart city conceptualization as technology-assisted and connected infrastructure and communities 
through sensors and automation closely resembles that of a digital twin. In that context, both digital twins and 
smart cities that are deemed to be successful are those that adopt a system of systems approach and balance the 
sociocultural, geospatial, and institutional perspectives of cities beyond the means of technology solutions (Quek 
et al., 2023; Yigitcanlar et al., 2019). Yigitcanlar et al. (2019) offer a multidimensional conceptual framework that 
centers on urban policy to inform urban planning and development where innovation economy, socioeconomic 
equality, ecological sustainability and (smart) governance, each equipped with their own performance indicators, 
are all critical for building smart cities. 


Some studies expand the use of urban digital twins with sociological approaches by focusing on social issues such 
as urban aging and gentrification, poverty, or other social disparities, termed as social urban digital twins (SUDT) 
(Sadowski & Bendor, 2019; Yossef Ravid & Aharon-Gutman, 2023). Such studies exemplify attempts to integrate 
social fabric with the built urban space, although not without raising ethical and legal questions behind the need 
to collect social data, currently the focus of the field of Digital Sociology (Lupton, 2015). At the same time, the 
criticism of such developments is based on observations that corporations tend to reframe urban sustainability 
challenges that favor narrow economic gains at the expense of socio-ecological sustainability, especially in the 
context of energy consumption and smart grids (Evans et al., 2019; Quek et al., 2023). Nevertheless, all the city 
digital twin developments testify to the complexity of replicating such complex and evolving systems, even at the 
physical levels, which is perhaps one of the reasons for the adoption of technocratic approaches that ignore wider 
social and environmental factors (Kitchin, 2014; Semeraro et al., 2021). 


4.2 Technological optimism and implementation reality 


The development of large-scale city digital twins generally involves the integration of 2D and 3D information and 
data models, such as BIM or GIS, and data sources, such as sensors, Internet of Things, and other solutions that 
form the physical, network and computing layers (Quek et al., 2023; Semeraro et al., 2021). Research on large 
scale city and urban digital twins remains more focused on the software side of modeling the physical environment, 
rather than on participatory planning and policymaking informed by human-centered behavior analysis, an 
approach that would enable planners and policy makers to understand the knock-on effects of environmental 
changes on social resilience (Ye et al., 2023). As the complexity of urban systems that need to be modeled and 
integrated is increasing, so has increased the rate of various siloed technological developments, posing new 
challenges for the city administrations and governance. It has been widely recognized that city digital twins will 
require a transition from single institutions to scalable solutions where multiple professional domains contribute 
the data and inform the relevant analyses (Savage et al., 2022). The technical complexity of integrating various 
data formats, applications, systems and other sub-system DTs has consequently drawn much more attention to the 
technological considerations for resolving such issues. The proliferation of various public and private 
technological research and development efforts have further diversified the digital ecosystem at the expense of 
knowledge sharing and cross-domain collaboration, leaving the development of city digital twins in their infancy 
(Shahat et al., 2021). 
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Though technological challenges remain important to be resolved, the field of smart city and digital twin 
developments have become progressively critiqued for their heavy reliance on technologies as means to manage 
urban and environmental crises (e.g. Nochta et al., 2019; Yossef Ravid & Aharon-Gutman, 2023). Advanced smart 
city initiatives, such as Singapore! or Beijing” for example, increasingly embed new technologies into city design, 
retrofitting or upgrading their infrastructure, which presents challenges for the city's phased developments and the 
pace of technological developments. As Quek et al. (2023) illustrate, cities develop at a much slower pace than 
technologies, whereby the time projects are completed, the technology solutions may well become outdated. This 
further exacerbates the existing challenges of integration, interoperability, and compatibility, perpetuating the 
cycle of pursuing technological solutions to technology-created problems. Some studies have pointed out the 
challenge of profit-driven corporate interests seeping into the social realm by appropriating and dominating the 
narrative of urban challenges and technological fixes (Sadowski & Bendor, 2019; Yossef Ravid & Aharon- 
Gutman, 2023). This complex technological influence on urban governance where social perspective has been 
largely absent, presents academics, professionals and policy makers with a real challenge of working together to 
enable outcome-based and value-driven decision making that drives more comprehensive social, environmental 
and economic values. The ever-growing digital ecosystem of various digital twin technology solutions has 
consequently raised several practical challenges that extend those of interoperability alone. 


4.3 Practical challenges for socio-ecological and systems thinking 


The review of studies revealed a number of both technological and strategic challenges facing the development of 
large scale complex digital twins for addressing socio-ecological and environmental goals. These have been 
broadly categorized into three categories (Table 2) and described further below. 


Table 2: Overview of select hindrances to the development of large-scale digital twins. 


Category Issues Description 
Data Volume/Quality Overproduction of unusable data vs. co-production of socially relevant 
information; data quality; data errors 
Bias Selection bias or misrepresentation of marginalized communities in the 
design and deployment of digital twins 
Availability/Ethics Private, proprietary or other sensitive data, especially social data; security, 
legal and commercial boundaries 
Heterogeneity Domain-specific data types and formats; qualitative vs. quantitative; static 
vs. dynamic; coding and structuring approaches 
Reusability Lessons learned recorded in a machine-readable form; cross-pollination or 
knowledge between projects and domains 
Ownership Enable individuals and communities to envisage and understand data on a 
human scale; calibration of citizens data 
Model Complexity Physical, social, ecological datasets; dynamic spatial-temporal and socio- 
ecological changes 
Optimization Model assumptions are clear; data are transparent; trade-offs and 
contradictions between different targets and outcomes 
Integration Siloed development Within the design, social, and engineering sciences; between research and 
practice; techno-rationalism and corporatization of technology 
Interoperability Integrating multiple GIS, BIM, CIM, 2D and 3D data models; individual 


technology solutions 


Digital divide DT development and integration bound to the available investments and 
resources at district, region or country levels 


Data is the basis of all digital twins and generating, accessing, filtering, analyzing and using relevant data presents 
an array of different challenges for the development and usability of digital twins. There is a general consensus 
across the literature that the overproduction of data, either from the projects or sensors and users, is a problem, 
prompting an increasing reliance on machine learning and artificial intelligence to process and make sense of the 
ever growing volumes of raw data. This vast and unfettered production of data also signals the separation between 
the digital and the human where the suggestion that one of the benefits of the digital revolution is the production 
of ‘big data’, becomes dangerous without recognizing our limited ability to make use of it (Ewart, 2018). On the 
other hand, the needed data may not be easily available due to privacy or proprietary issues. 


l https://Awww.smartnation.gov.sg 
? https://www.beijingcitylab.com/projects-1/43-smart-cities-review/ 


1067 


Most instances of urban digital twins developments are based in 3D models, while the integration of 2D 
information and non-graphical data is far more sporadic. Ye et al. (2023) in their review demonstrate how 
multidimensional visualization of integrated social sensing data, land-use change and demand models for example, 
while essential for planning of future urban landscape, is largely missing. Furthermore, data use is driven by the 
purpose of their primary users reflecting an inherent bias, even when data are quantitative (Nikolić & Whyte, 
2021). In addition, different coding schemes used across the professional domains for describing and structuring 
the data complicates efforts to automate the querying of data (Peters & Schindler, 2023). Still, planning and design 
approaches, especially at broad urban, social, and environmental scales, involve consideration of a range of factors, 
uncertainties, and conflicting goals, all largely part of a decision-making process of which automation is not yet 
capable (Allam & Dhunny, 2019). 


The difficulty of simulating complex systems of systems, such as urban environments has long been debated where 
some have questioned the value of digital twins in the face of simpler monitoring systems (Ferré-Bigorra et al., 
2022). Complexity theory has surfaced as a conceptual framework for studying and designing smart cities (e.g. 
(Yossef Ravid & Aharon-Gutman, 2023) to help deal with issues of uncertainty, diversity and emergence and 
inform policies on ways to cope with unpredictable behavior of urban systems. Creating digital twins with a socio- 
ecological focus necessitates inputs not only from the allied built environment disciplines, but also from the fields 
of sociology, anthropology, ecology and planning, which still remains short in supply. The importance of such 
approaches informing policies is illustrated by Savage et al. (2022) where the changes in energy consumption 
patterns resulting from a combination of measures in carbon tax, technology adoption and land use data would 
affect social inequality in the UK. 


Resulting from the challenges above, the integration of data, models and approaches across domains and scales 
perhaps remains the over compassing challenge to the development of complex and connected digital twins. 
Integration of diverse datasets, models and methods that better account for differences in human behaviors further 
remain underexplored (Delmelle, 2021). It is clear that future tools will have to incorporate different types of data 
from a variety of domains. This however, does not even account for the computational resources needed to process 
such large and complex datasets. The remaining controversy is the viability of urban digital twins and whether 
they can ever truly represent the intra- and inter-social complexity of socio-technical and socio-ecological systems, 
especially those that incorporate societal elements (Batty, 2018). 


5. CONCLUSIONS 


The twin urgency of climate change and sustainable development have propelled explorations of large scale urban 
digital twins as a data-centric, cross-disciplinary platform that could promote better decisions through mutual 
learning, public participation and stakeholder engagement. The concept of urban digital twins and the value of 
data sharing across sector boundaries has been recognized in the UK through a National Digital Twin program 
(CDBB, 2019). Research, however, demonstrates that the urban digital twin conceptualization, development, and 
implementation are still very much in their infancy, while the narratives intersect with those of smart cities. Urban 
environments with a complex interplay of spatial, social, environmental, and economic factors have proven 
challenging for the digital “twinning”, leaving most digital twin developments stopping short of modeling socio- 
ecological and socio-technical systems. When designed well, city digital twins, as other technologies, should 
support human agency and democratize the decision-making process, shifting the balance of authority away from 
the experts alone. Yet, despite their great potential and promise they hold, the vast technological bottlenecks 
exemplified in the issues of data, interoperability, federation, integration, scalability or futureproofing of 
technological solutions have all focused much attention on resolving such issues and at the expense of exploring 
and including the socio-ecological dimensions that may impact the planning and development of scenarios. 


While sustainability encompasses environmental, social and economic aspects, the literature review demonstrates 
a predominant focus on the potential of digital twins to achieve decarbonization goals through carbon and energy 
consumption metrics. In this paper, we began by mapping the latest research on large-scale DT applications across 
domains to describe the elements of the dominant narratives for informing future changes in the built environment 
and addressing the challenges of sustainable development and social resilience. Although socio-ecological 
perspective extends the sustainability trifecta by promoting systems thinking and offering new theory necessitating 
multidisciplinary management approaches, we illustrated how the urban and city digital twin developments remain 
largely domain-specific where projects are yet to be seen as interventions within larger complex systems. 
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Finally, while digital twins offer powerful and novel ways to engage diverse disciplines in shared conversations, 
fragmented practices are maintained not only within traditional and institutionalized modes of working, but also 
discipline-specific tools and technologies designed to handle data at different scales and data needs. Integrating 
such diverse data sets not only requires overcoming issues of interoperability, but also crafting new narratives 
around salient spatial, social, and ecological features aimed at the users most likely to have a say in decisions with 
longer term consequences. 


REFERENCES: 
Allam, Z., & Dhunny, Z. A. (2019). On big data, artificial intelligence and smart cities. Cities, 89, 80-91. 


Azar, C., Holmberg, J., & Lindgren, K. (1996). Socio-ecological indicators for sustainability. Ecological 
Economics, 18(2), 89-112. 


Batty, M. (2018). Artificial intelligence and smart cities. Environment and Planning B: Urban Analytics and City 
Science, 45(1), 3—6. 


Bonci, A., Carbonari, A., Cucchiarelli, A., Messi, L., Pirani, M., & Vaccarini, M. (2019). A cyber-physical system 
approach for building efficiency monitoring. Automation in Construction, 102, 68-85. 


CDBB. (2019). National Digital Twin Programme. https://www.cdbb.cam.ac.uk/what-we-did/national-digital- 
twin-programme 


Climate Change Committee. (2019). Net Zero—The UK’s contribution to stopping global warming. Climate 
Change Committee. https://www.theccc.org.uk/publication/net-zero-the-uks-contribution-to-stopping-global- 


warming/ 


Delmelle, E. C. (2021). Toward a More Socially Impactful Geographical Analysis. Geographical Analysis, 53(1), 
148-156. 


Evans, J., Karvonen, A., Luque-Ayala, A., Martin, C., McCormick, K., Raven, R., & Palgan, Y. V. (2019). Smart 
and sustainable cities? Pipedreams, practicalities and possibilities. Local Environment, 24(7), 557-564. 


Ewart, I. J. (2018). Humanising the Digital: A Cautionary View of the Future. In T. Dixon, J. Connaughton, & S. 
Green (Eds.), Sustainable Futures in the Built Environment to 2050 (pp. 325-335). John Wiley & Sons, Ltd. 


Ferré-Bigorra, J., Casals, M., & Gangolells, M. (2022). The adoption of urban digital twins. Cities, 131, 103905. 


Green, S. D., & Sergeeva, N. (2020). The contested privileging of zero carbon: Plausibility, persuasiveness and 
professionalism. Buildings and Cities, 1(1), Article 1. 


Grieves, M. W. (2019). Virtually intelligent product systems: Digital and physical twins. Complex Systems 
Engineering: Theory and Practice, 175-200. 


Gultekin, P., Mollaoglu-Korkmaz, S., Riley, D. R., & Leicht, R. M. (2013). Process Indicators to Track 
Effectiveness of High-Performance Green Building Projects. Journal of Construction Engineering and 
Management, 139(12), A4013005. 


Ince, F. (2023). Socio-Ecological Sustainability Within the Scope of Industry 5.0. In Implications of Industry 5.0 
on Environmental Sustainability (pp. 25-50). IGI Global. 


Kitchin, R. (2014). The real-time city? Big data and smart urbanism. GeoJournal, 79(1), 1-14. 


Korkmaz, S., Riley, D., & Horman, M. (2010). Piloting Evaluation Metrics for Sustainable High-Performance 
Building Project Delivery. Journal of Construction Engineering and Management, 136(8), 877-885. 


Levin, K., Cashore, B., Bernstein, S., & Auld, G. (2012). Overcoming the tragedy of super wicked problems: 
Constraining our future selves to ameliorate global climate change. Policy Sciences, 45(2), 123—152. 


Lupton, D. (2015). Digital Sociology. Routledge. 


1069 


Nikolić, D., & Whyte, J. (2021). Visualizing a New Sustainable World: Toward the Next Generation of Virtual 
Reality in the Built Environment. Buildings, 11(11), Article 11. 


Nochta, T., Badstuber, N., & Wahby, N. (2019). On the Governance of City Digital Twins—Insights from the 
Cambridge Case Study [Working Paper]. CDBB. 


Opoku, D.-G. J., Perera, S., Osei-Kyei, R., & Rashidi, M. (2021). Digital twin application in the construction 
industry: A literature review. Journal of Building Engineering, 40, 102726. 


Opoku, D.-G. J., Perera, S., Osei-Kyei, R., Rashidi, M., Bamdad, K., & Famakinwa, T. (2023). Barriers to the 
Adoption of Digital Twin in the Construction Industry: A Literature Review. Informatics, 10(1), 14. 


Papadonikolaki, E., & Anumba, C. (2022, October 26). How can Digital Twins support the Net Zero vision? 
[Proceedings paper]. 19th International Conference on Computing in Civil& Building Engineering (ICCCBE),. 
In: (Proceedings) 19th International Conference on Computing in Civil& Building Engineering 
(ICCCBE),. ICCCBE 2022: Cape Town, South Africa. (2022); ICCCBE 2022. 


Peters, D., & Schindler, S. (2023). FAIR for digital twins. CEAS Space Journal. 
Quek, H. Y., Sielker, F., Akroyd, J., Bhave, A. N., Richthofen, A. von, Herthogs, P., Yamu, C. van der L., Wan, 
L., Nochta, T., Burgess, G., Lim, M. Q., Mosbach, S., & Kraft, M. (2023). The conundrum in smart city 


governance: Interoperability and compatibility in an ever-growing ecosystem of digital twins. Data & Policy, 5, 
e6. 


Rabeneck, A. (2008). A sketch-plan for construction of built environment theory. Building Research & 
Information, 36(3), 269-279. 


Rudolph, D. (2023). The question of ‘sustainable’ technology: From socio-ecological fixes to transformations. 
Human Geography, 16(1), 81-86. 


Sadowski, J., & Bendor, R. (2019). Selling Smartness: Corporate Narratives and the Smart City as a Sociotechnical 
Imaginary. Science, Technology, & Human Values, 44(3), 540-563. 


Savage, T., Akroyd, J., Mosbach, S., Krdzavac, N., Hillman, M., & Kraft, M. (2022). Universal Digital Twin: 
Integration of national-scale energy systems and climate data. Data-Centric Engineering, 3, €23. 


Semeraro, C., Lezoche, M., Panetto, H., & Dassisti, M. (2021). Digital twin paradigm: A systematic literature 
review. Computers in Industry, 130, 103469. 


Shahat, E., Hyun, C. T., & Yeom, C. (2021). City Digital Twin Potentials: A Review and Research Agenda. 
Sustainability, 13(6), Article 6. 


Tzachor, A., Sabri, S., Richards, C. E., Rajabifard, A., & Acuto, M. (2022). Potential and limitations of digital 
twins to achieve the Sustainable Development Goals. Nature Sustainability, 5(10), Article 10. 


United Nations Environment Programme. (2022). Emissions Gap Report 2022: The Closing Window—Climate 
crisis calls for rapid transformation of societies. http://www.unep.org/resources/emissions-gap-report-2022 


Wan, L., Nochta, T., & Schooling, J. M. (2019). Developing a City-Level Digital Twin: Propositions and a Case 
Study. International Conference on Smart Infrastructure and Construction 2019 (ICSIC), 187-194. 


Waring, T. M., & Richerson, P. J. (2011). Towards Unification of the Socio-Ecological Sciences: The Value of 
Coupled Models. Geografiska Annaler. Series B, Human Geography, 93(4), 301-314. 


Whyte, J., Fitzgerald, J., Mayfield, M., Coca, D., Pierce, K., & Shah, N. (2019). Projects as Interventions in 
Infrastructure Systems-of-Systems. INCOSE International Symposium, 29(1), 542—542. 


Whyte, J., Mijic, A., Myers, R. J., Angeloudis, P., Cardin, M.-A., Stettler, M. E., & Ochieng, W. (2020). A research 
agenda on systems approaches to infrastructure. Civil Engineering and Environmental Systems, 37(4), 214-233. 


1070 


Whyte, J., & Nikolić, D. (2018). Virtual Reality and the Built Environment (2 edition). Routledge. 


Ye, X., Du, J., Han, Y., Newman, G., Retchless, D., Zou, L., Ham, Y., & Cai, Z. (2023). Developing Human- 
Centered Urban Digital Twins for Community Infrastructure Resilience: A Research Agenda. Journal of Planning 
Literature, 38(2), 187-199. 


Yigitcanlar, T., Han, H., Kamruzzaman, Md., Ioppolo, G., & Sabatini-Marques, J. (2019). The making of smart 
cities: Are Songdo, Masdar, Amsterdam, San Francisco and Brisbane the best we could build? Land Use Policy, 


88, 104187. 


Yossef Ravid, B., & Aharon-Gutman, M. (2023). The Social Digital Twin:The Social Turn in the Field of Smart 
Cities. Environment and Planning B: Urban Analytics and City Science, 50(6), 1455—1470. 


1071 


APPLICATION OF THE INTERNET OF THINGS (IoT) FOR ENERGY 
EFFICIENCY IN BUILDINGS: A BIBLIOMETRIC REVIEW 


Nnaemeka Nwankwo & Dr. Ezekiel Chinyio 
University of Wolverhampton, School of Architecture and Built Environment, Wolverhampton, United Kingdom. 


Dr. Emmanuel Daniel 
University of Wolverhampton, School of Architecture and Built Environment, Wolverhampton, United Kingdom. 


Dr. Louis Gyoh 
University of Wolverhampton, School of Architecture and Built Environment, Wolverhampton, United Kingdom. 


ABSTRACT: Buildings are experiencing tremendous transformation, where Internet of things (IoT) is been used 
to transform traditional buildings into smart structures. While there are viable IoT techniques, developing IoT 
applications and operations to fully realise the technology's promise is needed. This may be done successfully by 
bridging the gaps in the present research to establish a foundation for future investigations. This study analysed 
extant literature in IoT (between 2008 and 2022) through a bibliometric review to tease out critical measures for 
their integration and transformation. The study adopted a science mapping quantitative literature review approach 
and employed bibliometric and visualisation techniques to systematically investigate data. The Scopus database 
was used to collect data and VOSviewer software to analyse the data collected to determine the strengths, weights, 
clusters, research trends in IoT: Important findings emerging from the study include recent literature by various 
researchers on IoT applications in buildings. The shift in recent patterns of research from developed to developing 
countries. Eighty-nine (89) keywords were analysed and divided into six clusters. Each cluster is discussed to 
present its research area and associated future studies in relation to Smart buildings. This paper uses bibliometric 
analysis to unpick recent trends in IoT and its relevant application to buildings. The paper provides a blueprint 
for future IoT research and practice, needed awareness and future strategy directions for IoT applications in 
construction. This creates opportunities to transition to more sustainable construction sector. 


KEYWORDS: Bibliometric review, Energy efficient buildings, IOT (Internet of Things), Literature review, Smart 
buildings, sustainability, science mapping. 


1. INTRODUCTION 


Massive challenges caused by rapid digitalization have greatly increased the demand for energy (Al-Obaidi et al., 
2022). Energy consumption around the world is estimated to increase by 56% in 2040 (Energy Information 
Administration (EIA), 2013). Internationally, there are efforts to reduce energy consumption in buildings and cities 
such as the EU’s 2050 roadmap which aims to lessen energy and gas emissions by approximately 40% (Fragkos 
et al., 2017). Buildings, both residential and commercial, have played critical roles in human existence by 
providing convenient, safe, and satisfying venues for emotional, physical, and social requirements. Building 
inhabitants should constantly feel secure and protected, since this might affect their general well-being and 
productivity (Lawal & Rafsanjani, 2022). As a result, real-time monitoring, control, and management of a building 
and its inhabitants, components, appliances, systems, environment, and health is critical (Rafsanjani et al., 2018; 
Ghahramani et al., 2020). This emphasizes the need of automation in both household and business settings. Smart 
buildings are unique structures that employ intelligent automation for their operations to provide efficient, pleasant, 
and secure environments for its users. Building automation utilizing the Internet of Things (IoT), a renowned 
advanced technology, can provide cutting-edge solutions for strengthening security and safety, remoting 
appliances/systems, monitoring occupants, increasing efficiency, and improving visual and thermal comfort 
(Kanan et al., 2018; Saha et al., 2018). 


Although there have been various literature on IoT in the context of the buildings (Gholamzadehmi et al., 2020; 
Al-Obaidi et al., 2022; Wang et al., 2021; Bola et al., 2019; Mataloto, Ferreira & Cruz, 2019; Lawal & Rafsanjani, 
2022), only few studies have sought to summarize the existing research using bibliometric techniques. For 
example, Gholamzadehmi et al. (2020) conducted a review of adaptive-predictive control strategy for heating 
ventilation and air conditioning (HVAC) systems in smart buildings. However, the scope of their study on smart 
buildings is focused solely on smart control of building energy services. Al-Obaidi et al. (2022) carried out a 
systematic review of IoT for energy efficient buildings and cities from a built environment perspective analyzing 
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literature published between 2020-2022. However, their study scope is too broad and lacks exclusive statistical or 
quantitative focus on IoT research based on buildings. Bola et al. (2019) presented a critical survey of IoT based 
automated energy management in buildings. They reviewed various IoT applications in the area of building energy 
management and energy consumption data were recorded, which they highlight as a very important consideration 
in system planning and rehabilitation. Mataloto et al. (2019) presented efforts on optimizing energy consumption 
in buildings by use of an IoT based platform known as LoBEMS (Lora Building and energy management system). 
They developed an approach that helps local administration entities identify savings from personalized data 
visualization. Wang et al. (2021) conducted a thorough analysis of the extant literature on IoT and edge computing 
in different application fields, including smart homes and smart cities. Lastly, Lawal and Rafsanjani (2022) applied 
a systematic review while exploring the trends, benefits, risk and challenges of IoT implementation in residential 
and commercial buildings and highlighted that IoT is a crucial driver for the evolution of various types of 
buildings. 


Even though each of these reviews provides a wealth of valuable insights, no thorough and timely review utilizing 
bibliometrics, focused solely on smart buildings can be found in the literature, which presents an important 
research gap. Since the academic literature in the field of IoT has had significant growth, the application of a 
quantitative review approach is required to better understand the knowledge structure of the field (Rivera & Pizam, 
2015). Researchers should try to occasionally examine the accumulated body of knowledge as study fields develop 
and become more complex according to Ferreira et al. (2014), as well as understand new contributions, research 
trends and traditions, topics being studied and investigative the structure of knowledge and future research 
directions. 


This study analyses extant literature in IOT and its application to buildings (between 2008 to 2022) through a 
bibliometric review, to tease out critical measures for their integration and transformation. The objectives are to 
evaluate the global research trends of IOT application in construction based on citation analysis of countries and 
co-occurrences analysis of author keywords cluster, using Vosviewer document analytic software and Scopus 
database. The study findings would benefit the academic community as it contributes to (1) providing valuable 
directions by examining the bibliometric status of IOT in the built environment sector from the existing literature, 
identifying the knowledge areas with links for their integration and (2) identifying the critical areas needed to 
advance IOT application in buildings in future studies and to support practical implementation. 


2. METHODOLOGY 


Researchers commonly employ three approaches to evaluate literature, according to Zupic and Cater (2015): (1) a 
qualitative approach of a systematic literature review, (2) a quantitative approach via meta-analysis, and (3) science 
mapping (based on the quantitative approach utilizing bibliometric methodologies). The third technique is viewed 
as the most suited for assessing the state-of-the-art literature of a research topic and is quickly becoming 
increasingly popular in numerous disciplines of study (Tavares-Lehmann & Varum, 2021). Science mapping uses 
bibliometric approaches such as citation analysis to assist academics in identifying trends in the structure and 
dynamics of scientific subject topics. Using the bibliometric approach in scientific literature reviews enhances 
rigor and lowers researcher bias (Cavalieri et al., 2021). It is superior to typical literature reviews in that it provides 
for a more objective and methodical selection and assessment of scientific research on a specific topic (Cobo et 
al., 2015). To fulfil the study's goals and objectives, we used a bibliometric technique that includes three stages of 
review: (1) data collecting, (2) analysis and visualization, and (3) interpretation, like a prior study by (Obi et al., 
2023). 


2.1 Data collection 


A search query, selection of relevant database(s), and data screening are all part of the data collecting process (Ari 
& Cuccurullo, 2017). Employing the correct search phrases in a bibliometric study is important to success (Obi et 
al., 2023). According to Lawal and Rafsanjani (2022) we followed the search terms for IOT application in buildings. 
They chose keywords for the IOT research after conducting a thorough review of earlier relevant studies on the 
definition and application of IOT in various kinds of residential and commercial buildings and they compiled a list 
of important terms that are used interchangeably. As a result, a mixture of appropriate search phrases was employed, 
and the whole search code is as follows: 


“IOT buildings” OR “Internet of Things buildings” OR “smart buildings” OR “Intelligent Buildings” OR 
“automated buildings”. 
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We identified a database that contained bibliometric data. Scopus and Web of Science (WoS) are now prominent 
databases for retrieving publications (Obi et al., 2023). The Scopus database was used to extract and collect 
bibliographic data for the study. Scopus is a digital bibliographic platform widely recognized for high quality 
standards and a frequent instrument for doing construction-related bibliometric research (Patel et al.,2021). Rani 
and Kumar (2022) conducted bibliographic analyses and identified Scopus as a favored alternative for IOT 
application review research. Similarly, recent literature reviews in IOT research (Lawal & Rafsanjani, 2022; Al- 
Obaidi et al., 2022) have employed the Scopus database. 


To screen the obtained data, we used a set of inclusion and exclusion criteria (relevance, language, and quality). A 
total of 26,512 papers were returned because of the search in the Scopus core collection. The publication period 
was limited to 2008 to present (2022) and was chosen because the classification by year of publication shows a 
growing trend of articles published in relation to IOT within this period. Figure 1 shows distribution of articles by 
year of publication. From this image it can be said that publication on IoT related to building applications picked 
up in 2005 and had a more significant number of publications from the year 2008. The year 2008 recorded 52 
articles whilst the subsequent years recorded a rise in the number of articles published progressively and 
considering the increasing number of articles over the years, with the constant rise in publications, it can be inferred 
that literature on IOT application will continue to increase in years to come. 


Finally, Papers from subject areas with no strong affiliation to application of IoT in buildings like chemistry and 
decision sciences were eliminated. Non-English publications in the relevant topic areas were removed to avoid 
translation difficulties and to decrease ambiguity in essential ideas. Associated keywords presented by Scopus 
because of searching for relevant documents were skimmed through to identify duplicates and words with no 
strong link to the research area were subsequently excluded. To ensure the quality of the papers utilized, only peer- 
reviewed article publications and reviews were included. After that, the author performed further skim readings of 
the title, abstract, and selected document, resulting in the elimination of papers not connected to IOT application 
in buildings. Applying the relevance, language and quality criteria resulted to 21,637 papers being deleted during 
the process, leaving 4,875 articles used for the analysis. These articles were then exported to excel from Scopus 
(in the order of most cited to least cited documents) to allow the implementation of an analysis software 
(VOSviewer). 


Documents by year 


Yea 


Fig. 1: Showing a growing trend of research on IoT application to construction. (Source: scopus). 


2.2 Data analysis and visualization 


This paper makes use of citation and co-occurrence analysis. Commonly used bibliometric methodologies, 
according to Mas-Tur et al. (2021), are: 


(1) Co-occurrence analysis which evaluates the conceptual structure of knowledge in the subject, finding relevant 
keywords and themes related with the primary concepts of research. 


(2) Citation analysis estimates the impact of publications, authors, journals, or nations based on citation rates. 


The bibliographic information was presented using the visualization of similarities (VOS) viewer software version 
1.6.19. VOSviewer allows you to map, visualize, and identify the network structure in research (Leydesdorff & 
Nerghes, 2017). Because of the ease of understanding, presentation, and visualization of the maps, it was chosen 
above other regularly used tools such as Pajek and Citespace. The network is composed of distance-based maps, 
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with the distance between two elements reflecting the intensity of their link. A shorter duration often suggests a 
stronger bond. The size of the item label reflects the number of instances of the phrase discovered. A bigger label 
size indicates that the related item appears in more publications, while various colours reflect distinct groupings 
of items aggregated by VOSviewer's clustering approach (Yin et al., 2019). 


3. RESULT AND DISCUSSION DATA ANALYSIS AND VISUALISATION 


This section presents the bibliometric and network analysis results as tables and networks and a discussion of the 
various results gotten from the analysis. 


3.1 Citation analysis 


Citation analysis is used to identify high-impact journals and significant nations in IoT research. The number of 
publications and citations are used to assess the influence and quality of research in a certain topic (Wuni et al., 
2020). 


3.1.1 | Countries involved in researching the application of IoT in construction. 


The minimal number of citations and publications was set at 10 and 5 respectively, using VOSviewer. This was 
done to guarantee that only nations that are actively engaged in research on IOT application for buildings are 
chosen. 72 out of the 160 nations available were chosen to meet the criteria; the findings of the analysis are shown 
in Figure 2. There are 10 most productive countries leading in research on IoT in relation to building applications. 
The republic of China with a total of 730 documents and 35,390 citations. It is followed by United States of 
America with a total of 716 documents as well as other nations within the top ten as shown in Table 1. Among the 
ten nations China and India are the only developing countries, which is similar to results gotten in a review study 
by (Al-Obaidi et al., 2022). 


Table 1: Top 10 most productive countries involved in the research of IoT application to building construction. 


Country Document Citation 
China 730 35,390 
United States of America 716 21,163 
United Kingdom 323 14,280 
Italy 260 11,550 
South Korea 226 8,480 
India 225 6,343 
Canada 195 10,701 
Australia 182 6,612 
Spain 164 5,942 
Germany 156 6,450 


The closer the colour is too yellow, the recent the investigation is in literature, as seen in Figure 2, from countries 
like United States, Morocco, Nigeria and Pakistan. This demonstrates that recent study patterns are majorly 
researched by developed nations but are also shifting towards developing countries, particularly those seeking 
sustainable and energy efficient improvements in their building industry, highlighting the need for more empirical 
research in these developing countries. 
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Fig. 2: Vosviewer map of countries associated with the application of IoT in construction. 
3.2 Co-occurrence analysis on IoT application in construction 


Co-occurrence, highlights keywords and have an important role in bibliometric analysis. According to Van Eck 
and Waltman (2014), author keywords should be used for bibliometric analysis to show patterns in current research. 
As a result, author keywords were chosen as the foundation for the current study's co-occurrence maps. The 
threshold of occurrences of a keyword was chosen at 15 based on recent bibliometric literature review 
(Baghalzadeh Shishehgarkhaneh et al., 2022). Repeated terms (for example (“smart house” and "smart homes”) 
were eliminated. The criteria for the study were fulfilled by 89 of the 10,008 keywords. The co-occurrence 
network's large nodes (frames) and colour presentations, as well as the primary linkages, were investigated to 
analyse the research hotspots and concerns dominating IoT literature. The cluster formation was used in the co- 
occurrence analysis. 


3.2.1 | Co-occurrence of author keyword by cluster 


The keywords “Smart buildings”, “Energy efficiency” and “Internet of Things” have large nodes in the network 
as seen in Figure 3, indicating researchers have been more interested in studying these areas of research and their 
similar concepts. Six clusters, as shown in Figure 3 emerge following the analysis. 


Cluster 1 (IoT based sustainable construction design): It is in red and the largest cluster with 28 items: building 
energy efficiency, green building, building information model, thermal energy storage, solar energy are some of 
the relevant keywords in this cluster. This cluster indicated a strong focus on sustainable construction design in 
relation to IOT application to buildings. From a design perspective extensive IoT research has focused on Building 
energy efficiency, especially due to this area of research been one of the main goals of construction design (Lawal 
& Rafsanjani, 2022). 


Cluster 2 (Building automation system): It is green with 17 items; building energy management system, anomaly 
detection, building automation, model predictive control, building control, hvac, indoor air quality and thermal 
comfort are some of the relevant keywords emerging from this cluster. This cluster is concerned with IoT based 
building automation systems and the efficient control and management of building energy services. 


Fig. 3: keyword network visualization by cluster 
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Cluster 3 (AI in smart buildings): It is in blue and is made up of 17 items: machine learning, sensors, AI (artificial 
intelligence), internet of things, activity recognition, ambient intelligence, big data are some relevant keywords 
from this cluster. This cluster highlights the core operational principles of IoT, it explores the concept of AI and its 
contribution to the realisation of smart buildings. 


Cluster 4 (oT for efficient energy management in buildings); It is yellow and includes 10 items; Energy 
management, demand response, smart grid, micro grid and home energy management system are some of the 
relevant keywords. This cluster highlights various technical functions of an IoT system to efficiently manage 
energy consumption in buildings. 


Cluster 5 (Improved quality-of-service in smart buildings); It is purple and includes 9 items; wireless sensor 
network, neural network and Interoperability are some of the relevant keywords in this cluster. This cluster 
highlights the improved quality-of-service offered by IoT networks to smart building occupants or smart users. 


Cluster 6 (Blockchain in smart buildings); It is light blue in colour and includes 8 items; authentication, 
blockchain, smart city, privacy and security are some of the relevant keywords in this cluster. This cluster is 
centered around the use of blockchain in IoT based smart building systems and the role of the technology in 
protecting user privacy and safeguarding information that flows through IoT devices or nodes. 


4. DISCUSSION OF CLUSTERS AND FUTURE RESEARCH DIRECTION 


In this section, the six study fields (Blockchain in smart buildings, AI in smart buildings, IoT for efficient energy 
management, Improved quality-of-service in smart buildings, building automation and IoT based sustainable 
construction design) emerging from the results of the co-occurrence cluster analysis are discussed. The knowledge 
gaps and future study directions are also highlighted. 


4.1 Blockchain in smart buildings 


Blockchain is an emerging concept in IoT based technology with strong links to smart buildings and smart cities. 
Blockchain has emerged as an important layer of trust (Siountri et al., 2020). as well as a novel approach for 
improving data integrity and privacy in smart buildings. Blockchain, a type of distributed ledger, can be utilized 
to reduce the challenges of information sharing and security in smart buildings (Rejeb et al., 2022). Unlike 
traditional databases, blockchain is built on a peer-to-peer network design in which all network users handle 
transactions effectively and flexibly rather than being controlled by a trusted centralized authority (Nguyen et al., 
2020). It is conceivable to construct and update smart networks in the future, strengthen their resilience, and 
safeguard a rising amount and diversity of services by relying on blockchain's decentralization, immutability, and 
accountability. In this regard, blockchain has the potential to address important security concerns while also 
facilitating smart city operations (Rejeb et al., 2021). Machine learning is utilised to extract vital information from 
outsourced data and the findings are then stored securely on the blockchain to ease sharing (Rejeb et al., 2022). 


The overall effect of blockchain and IoT can be improved with the introduction of 5G. Blockchain technology 
provides numerous answers to the issues posed by 5G networks. According to Azzaoui et al. (2020), the technology 
supports Al-powered 5G and leads to the development of a more efficient, and secure cellular network. As a result, 
5G will serve as a foundation for IoT, blockchain and mobile edge computing (MEC), enhancing the analysis, 
collection and exploitation of smart building data (Hemmings, 2020) and making bandwidth less of a limiting 
factor in overall ecosystem design. With the transition to 5G networks, there is increased interest in investigating 
the pending issues of a 5G-enabled IoT for blockchain-based smart building applications, as 5G cellular networks 
are ineffective due to increasingly complex configuration issues in clouds and systems lacking AI functionality 
(Chen et al., 2020). As a result, future research should look at how blockchain might affect critical elements of 
IoT-based smart building applications (Rejeb et al., 2022). 


4.2 Alin smart buildings 


Artificial intelligence (AI) is another key area in IoT systems with strong links to smart buildings. The transition 
to smart buildings necessitates the collaboration of several technologies to make inhabitants’ lives more convenient 
and inclusive (Ahad et al., 2020). AI is regarded as a critical tool for advancing urban sustainability and building 
more inclusive and secure settings, as supported by the United Nations' Sustainable Development Goal 11 
(Sustainable development goal (SDG), 2015). AI systems rely on massive amounts of data and employ learning 
algorithms to discover patterns in the data, allowing for event prediction and decision-making tasks (Hariri et al., 
2019). This is especially essential when combined with other developing technologies that increase efficiency by 
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automating data collecting and eliminating the need for trusted third parties, hence maximizing profitability Hariri 
et al., 2019). In the work of Awan et al. (2020), for example, machine learning algorithms used to estimate parking 
spot availability in smart commercial buildings are thought to benefit from data collected by IoT sensors and 
devices. 


Future research into this cluster may focus on how urban decision-makers might collaborate with residents to 
develop and create smart cities that meet their expectations. The subject of how to involve all stakeholders in a 
solution-oriented and citizen-centric manner while solving urban difficulties with IoT and AI approaches is of 
greater relevance (Brynskov, 2018). Empirical research is also required to better understand the stakeholder-related 
issues that enable or impede IoT and AI implementation in smart construction (Rejeb et al., 2022). 


4.3 IoT for efficient energy management in buildings 


Energy management is an empirical function of smart buildings with strong links IoT application to construction. 
The associated cluster describes how energy management in relation to IoT technology is gaining attention, with 
numerous sophisticated and ubiquitous smart construction applications, such as smart grids. Smart grids are 
modern power networks that can change and re-adjust dynamically to offer electricity at a cheap cost and high 
quality (Alsamhi et al., 2019). Since IoT applications consume a significant amount of energy, smart building 
solutions must be able to use energy more effectively and implement effective energy prediction systems that 
reflect the dynamics of the IoT environment (Luo et al., 2019). 


The smart grid enables the exchange of energy and information between customers and utilities. Yet, the 
complexity of utilities to handling real-time data for making business-critical choices remains a difficulty (Alsamhi 
et al., 2019). Increased data usage improves grid stability and performance while also allowing the utility provider 
to make better decisions, allowing for efficient demand-side management and demand response. Nevertheless, the 
massive amount of raw data is incomprehensible or useless without a dependable and consistent capacity to process, 
analyse, and comprehend the information contained within such a massive amount of data. As a result, before 
taking action based on the data, the data must be turned into useable information. Such transformation is a difficult 
procedure since helpful information is not readily apparent from the data. Therefore, further investigation on the 
challenges faced by utilities to handle real-time data is essential (Syed et al., 2021). 


4.4 Improved quality-of-service in smart buildings 


Wireless sensor network (WSN) represents an important component of IoT by promoting resource efficiency and 
increasing smart inhabitants' quality of life (AlSawafi et al., 2020). WSN can handle large-scale installations in 
any metropolitan setting to perform tasks including real-time monitoring of physical and environmental conditions, 
routing and load balancing, industrial process monitoring, and energy efficiency optimization (Alsamhi et al., 
2019). Researchers have previously focused on WSN-based smart building and city applications for scheduling 
and routing (e.g., smart grids) that take into consideration the energy efficiency and quality of service (QoS) 
(Faheem et al., 2019). Nonetheless, the mobility and changing network topologies of IoT nodes continue to pose 
challenges to the stringent fulfilment of QoS requirements in IoT-based smart building applications. As a result, 
future research must adapt current routing algorithms in WSN to give QoS guarantee in terms of latency, 
dependability, bandwidth usage, scalability, and throughput. Moreover, researchers must investigate low-cost 
methods of connecting IoT equipment and collecting data across the vast number of decentralised WSN in smart 
cities (Sobin, 2020). 


4.5 IoT based sustainable construction design 


Building energy efficiency is another key area with strong link to IoTs application to buildings. According to recent 
studies, tracking energy use in buildings has piqued the interest of many academics interested in IoT and energy 
saving measures (Xu et al., 2020). Furthermore, the drive towards combining smart buildings with cutting-edge 
detection techniques has begun to set the framework for seeing IoT as an essential component of smart cities (Al- 
Obaidi et al., 2022). Recent research has revealed a surge in interest in IoT applications in smart buildings to 
enhance energy efficiency and decrease environmental concerns. According to other research, if buildings consider 
effective communication between their systems for operation, they can save a significant amount of energy. As a 
result of advancements in networking, computing, and sensing technologies, IoT has emerged as a critical 
component in the design and operation of any smart item in the built (Kumar et al., 2022). 


Smart design, smart action, smart control, smart monitoring, smart energy, smart waste and smart water are 
essential features of an IoT residential/commercial building that should be considered while converting a building 
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to a smart one to make the atmosphere more comfortable not only for the residents but also for the management 
staff (Lawal & Rafsanjani, 2022).These features necessitate many types of data, and thus the key difficulty might 
be big data analytics, which arises from the huge, diversified, and time-evolving high-resolution data provided by 
IoT devices and sensors. The growth of technology has resulted in a dramatic increase in the number of connected 
IoT devices, resulting in massive data generation and transfer. To improve data flow between devices many 
different technologies are necessary, which raises the complexity of IoT systems in every kind and size of 
residential or commercial structure. Based on our analysis of the literature, smart waste and water have seldom 
been investigated and so additional future research into these aspects is advised to create a completely automated 
building (Lawal & Rafsanjani, 2022). Furthermore, IoT device batteries consume a large amount of energy and as 
a result, the rising rate of IoT device deployment has resulted in increased energy consumption making IoT and 
environmental concern which should be considered in future research (Lawal & Rafsanjani, 2022). 


4.6 Buildings automation system 


Building automation is yet another integral part of IoTs application to building with strong links to the research 
area. A building automation control system (BACS) is defined as a computer-based and automated system that 
analyses the specific needs of a building by controlling the associated mechanical and electrical plants/equipment 
installed in the building, thereby contributing to energy savings without compromising user thermal/visual comfort. 
A BACS's major goal is to maintain occupant thermal/visual comfort while maintaining an energy efficient and 
cost-effective building operation. It incorporates algorithms that replace user demands in directing technological 
systems depending on various objectives, such as; thermal comfort, Energy savings and cost savings 
(Gholamzadehmir et al., 2020). 


The use of model predictive control, which also has a direct link to IoT application in buildings is a technique in 
BACS which has recently attracted a lot of interest from the scholarly community (Serale et al., 2018). An 
advanced control strategy (ACS) with a forecasting function, known as MPC, is necessary to achieve high energy 
and comfort performance levels by including renewable energy generation, innovative solutions for technical 
systems (e.g., heat pumps), and energy storage systems (Gholamzadehmir et al., 2020). MPC is commonly used 
in the building industry to forecast the dynamic behavior of systems in the future and alter reaction by the controller, 
accordingly, resulting in energy and cost savings while maintaining thermal comfort (Serale et al., 2018). 


The evaluation of an accurate prediction horizon based on the system's characteristics is one of the fundamental 
concerns in predictive control systems (Gholamzadehmir et al., 2020). According to the literature study, the most 
typical setting for prediction and horizon control is one day ahead (Liu & Heiselberg, 2019). Nevertheless, the 
tuning of the prediction and control horizons may be influenced by the building boundary circumstances, such as 
the climate environment and building characteristics (Gholamzadehmir et al., 2020). There is particularly limited 
data on the link between the prediction horizon and control horizon. As a result, additional research is needed to 
analyse the relationship between prediction/control horizon and various building boundary conditions for best 
energy and cost saving outcomes. Because the predictive model is dependent on the quality of the input data, the 
function of sensor reliability and location are critical. There are just a few publications that report on this crucial 
issue and therefore research is required to fill this vacuum, which would otherwise be a weak spot for ACS 
(Gholamzadehmir et al., 2020). 


Based on the study findings as discussed, predominant and emerging concepts that can serve as conduits for IoTs 
application in Buildings and the proposed directions for advancing research and practice are summarized in Table 
2. Future investigations could pay more attention to the current and emerging concepts in IoT and its application 
to buildings. 


Table 2: Themes, research area and future research direction 


Theme Research areas and concepts with links to IoT Future research direction 


application in Buildings 
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Blockchain in IoT based smart 


buildings 


AI in smart buildings 


IoT for efficient energy 


management 


Improved quality-of-service in 


smart buildings 


IoT based 


construction design 


sustainable 


Building automation system 


Blockchain 
Privacy 
Smart city 


Security 


Machine learning 
AI 

Sensors 

Activity recognition 


Ambient intelligence 


Energy management 
Smart grid 

Micro grid 

Demand response 


Home energy management system 


Wireless sensor networks 
Interoperability 


Neural network 


Building energy Efficiency 

Building Information Model 
Building envelope 

Green building 

Thermal energy storage 

Solar energy 

Sustainability 

Building energy management system 


Anomaly detection 


1080 


Issues of a 5G-enabled IoT for blockchain- 
based smart building. applications 


Cost-effective and scalable blockchain 


solutions. 


Stakeholder-related issues that enable or 
impede IoT and AI implementation in 
smart buildings. 


How to involve all stakeholders in a 
solution-oriented and _ citizen-centric 
manner while solving urban difficulties 


with IoT and AI approaches 


The complexity of utilities to handling 
real-time data for making business-critical 


choices 


Adapt current routing algorithms in WSN 
to give QoS guarantee in terms of latency, 
dependability, 
scalability, and throughput. 


bandwidth usage, 


low-cost methods of connecting IoT 
equipment and collecting data across the 
vast number of decentralised WSN in 


smart cities. 


Investigation on smart waste and smart 


water 


Negative effects of oT as an 


environmental concern 


Analyse the relationship between 
prediction/control horizon and various 
building boundary conditions for best 


energy and cost saving outcomes 
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Building automation Reliability and proper location of IoT 
Sensors. 

Model predictive control 

Building control 

Hvac 


Indoor air quality 


thermal comfort 


5. CONCLUSION 


This study conducts a bibliometric review of the extant literature on IoT application in construction from 2008 to 
2023 to tease out critical measures for its integration and transformation. In this study, 4875 publications on IoT 
within the building and construction sector retrieved from Scopus were analyzed using bibliometrics and network 
analysis in VOSviewer. 


The demographic maturity levels and increased prevalence are most notably from China, Italy, the USA and UK 
Nevertheless, trends in recent IoT research are emerging from developing countries, indicating a surge for 
sustainable improvements in their construction practices. To enhance IoT research globally, developed and 
developing countries need to collaborate. The poor collaborative links between IoT researchers across developed 
and developing countries may be one of the reasons contributing to the slow understanding and uptake of IoT 
systems in developing economies. Therefore, funding research projects, research hubs and spoke networks and 
other collaborative research activities as appropriate between developed and developing countries should be highly 
encouraged. These can facilitate knowledge exchange and transfer on policies and implementation strategies to 
promote IoT application practice in construction. 


Six cluster areas were identified including blockchain in smart buildings, IoT based sustainable construction design, 
Improved quality-of-service in smart buildings, building automation system, IoT for efficient energy management 
in buildings and AI in smart buildings. These areas currently seek to optimize energy efficiency in buildings, reduce 
waste and the environmental impact throughout a building's operation. There are emerging concepts from the 
cluster and there is the need to expound their links, especially block chain. This is with the view of foreseeing a 
more strategic approach for improving data integrity and privacy in smart buildings and cities. 


This study contributed by highlighting the bibliometric research status of IoT in relation to its application in 
buildings, identified current gaps in the literature and provided directions for future studies and practice. More 
importantly, the evidence gleaned from this study would help IoT players and policymakers to develop bespoke 
strategies, frameworks and policy measures for integrating and implementing IoT practices, creating opportunities 
to transition to more sustainable systems in the construction sector. However, there were some limitations. One is 
the use of only Scopus database. Second is the use of only Journal articles and reviews written in English, and 
third is the exclusion of discussions of other emerging areas because they had no current links to IoT. Future 
research may use other databases or incorporate data from different sources to enhance generalizability. They can 
also broaden the sources of documents such as books and include those in foreign languages, to broaden the variety 
of data. Future research might investigate additional growing sectors where there are presently no linkages to IoT. 
In addition, expert systems and fuzzy tools can be used to explore a more in-depth quantitative analysis. 
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ABSTRACT: The great challenge of global climate change urges world economies to reduce greenhouse gas 
emissions and promote sustainable development, where the building sector plays a vital role. Carbon tracking 
technology is one of the keys to capturing carbon emissions for sustainable construction such as net-zero buildings. 
This paper reviews five key carbon tracking technologies — life cycle assessment (LCA), energy modeling, building 
operation monitoring, carbon accounting software, and green certification and rating systems. With summarized 
advantages, beneficiaries, and limitations of the five technologies, we propose a Carbon Tracking ‘Cabbage’ 
(CTC) framework that incorporates all carbon tracking tools as inner technological layers for multiple 
stakeholders at multiple stages of construction management. The main contribution of this paper is the CTC 
framework that rationalizes the scopes and adoption strategies of carbon tracking technologies by collaborative 
stakeholders to achieve informed decision-making, implement effective carbon reduction strategies, and 
subsequently contribute to climate change mitigation actively. 


KEYWORDS: Carbon tracking; Building sector; Carbon tracking cabbage framework; multi-stakeholder; 
Technology adoption 


1. INTRODUCTION 


Global climate change poses an increasingly severe challenge to human society and ecosystem. Carbon emissions, 
as a major source of greenhouse gases, have been widely recognized as one of the primary drivers of climate 
change. In light of this, establishing a sustainable green economy through collective global efforts has become a 
pressing priority. The building sector plays a crucial role in global carbon emissions, accounting for a significant 
portion of the global greenhouse gas output (Khalili & Chua, 2013). As society strives to combat climate change 
and become carbon neutral, addressing the carbon footprints of buildings and adopting sustainable practices in the 
construction and operation of these structures has become a top priority. 


Tracking and monitoring carbon emissions, particularly carbon dioxide (CO2), reveal evidence and insights into 
the amount of carbon in the atmosphere, enabling targeted strategies to reduce emissions and mitigate the impacts 
of climate change (Liu et al., 2020). Carbon tracking, as an essential component of the broader carbon management 
strategy, has emerged as a powerful tool to measure, monitor, and mitigate the carbon impact of buildings (Liu et 
al., 2020). Additionally, carbon tracking provides crucial data for setting emission reduction goals, empowering 
governments, organizations, and industries to establish clear and achievable targets while monitoring progress. 
Through carbon tracking and reporting, businesses can measure the carbon footprint, identify improvement 
opportunities, transparently disclose environmental impacts to stakeholders, and showcase the social responsibility. 
Furthermore, standardized carbon tracking and reporting methods foster global cooperation, which enables 
international transparency, peer pressure, and collaboration in achieving shared climate goals (Kang et al., 2015). 
In summary, carbon tracking and reporting are essential tools in combating climate change, providing vital data to 
empower decision-makers, businesses, and individuals to take proactive action in creating a more sustainable and 
resilient future. 


The significance of carbon tracking technologies in the building sector lies in measurability and interpretable 
evidence of comprehensive carbon emissions across the entire lifecycle of a building. From the production and 
transportation of construction materials to energy consumption during building operations and eventual demolition, 
these technologies offer valuable insights into the carbon impact of each stage, enabling informed decision-making 
and targeted carbon reduction strategies. Over the years, carbon tracking technologies in the building sector have 
undergone significant advancements and innovations (Xu et al., 2023). From traditional methodologies to cutting- 
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edge digital solutions, these tools have played a pivotal role in quantifying the carbon footprint of buildings at 
different stages of their lifecycle. By providing a holistic assessment of carbon emissions, these technologies 
empower stakeholders, including architects, engineers, policymakers, and building owners, to make informed 
decisions that foster sustainable building practices. 


However, current carbon tracking technologies fall short of encompassing the entire building lifecycle in the 
building sector. Although individual technology is advantageous, carbon tracking as a whole is always fragmented 
and incomplete. Subsequently, the fragmented and incomplete carbon tracking hinders holistic and effective carbon 
reduction strategies and leads to missed opportunities for emission mitigation and sustainable practices. Thus, there 
is a significant research gap in a comprehensive approach that spans carbon tracking to all stages of a building’s 
lifecycle. 


This paper embarks on a novel framework that rationalizes the full-lifecycle carbon tracking based on an in-depth 

exploration of all the carbon tracking technologies in the building sector. By reviewing historical milestones, 

technological breakthroughs, and real-world applications, we aim to present a comprehensive overview of the 

evolution of these technologies and their impact on the industry’s sustainability efforts. The primary objectives of 

this study are: 

© To provide an in-depth analysis of the historical evolution of carbon tracking technologies in the building 
sector, highlighting key milestones and breakthroughs that have shaped their current state; 

© To examine the existing carbon tracking technology models, methodologies, and tools deployed in the building 
industry, assessing their effectiveness, limitations, and potential for future enhancements; and 

@ To explore real-world case studies and successful implementations of carbon tracking strategies, showcasing 
how these technologies have contributed to carbon reduction goals and sustainable building practices. 


2. LITERATURE REVIEW 
2.1 Existing carbon tracking technologies 


As shown in Fig. 1, there are five main categories of carbon tracking techniques. They are LCA, energy modeling, 
building operation monitoring, carbon accounting software, and green certifications and rating systems. The 
associated phases and key technologies are sketched along the of a curved arrow of a typical construction project 
course in Fig. 1. 
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Fig. 1: Conceptual map of existing carbon tracking technologies. 


LCA. LCA is a systematic method used to assess the carbon emissions generated throughout the entire lifecycle of 
a building, including raw material procurement, construction, operation, and demolition (Dodoo et al., 2014). By 
comprehensively regarding the environmental impacts at each stage, LCA provides comprehensive carbon 
emission data, guiding design and material selection to achieve sustainable and low-carbon building solutions. The 
strength of LCA lies in its comprehensiveness, such as material production, transportation, construction, and 
dismantling, beyond the building’s use phase. Thus, LCA offers a more holistic evaluation of the building’s 
environmental impact. 


Energy Modeling. Energy modeling is a method that involves simulating a building’s energy consumption using 
specialized software, thereby estimating carbon emissions. These models integrate factors such as building design, 
energy systems, and climate conditions, providing architects and energy experts with optimized strategies to 
enhance energy efficiency and reduce carbon emissions (Wang et al., 2021). Energy modeling allows decision- 
makers to predict a building’s energy performance under different conditions during the design phase, facilitating 
environmentally friendly and energy-efficient choices. Additionally, it enables continuous monitoring and 
optimization of energy usage during the subsequent operational phase. 


Building Operation Monitoring. Building operation monitoring entails the installation of sensors and monitoring 
systems to collect real-time data on energy consumption and emissions. These monitoring systems help building 
managers gain better insights into energy usage, promptly identify potential energy waste, and implement measures 
to reduce carbon emissions (Bilec et al., 2010). Continuous monitoring enables building managers to track energy 
performance and make timely adjustments and improvements to achieve sustained carbon reduction. 
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Carbon accounting software. Carbon accounting software is a specialized tool used to track and record carbon 
emission data for building projects and companies. These software solutions typically offer data collection, 
analysis, and reporting functionalities, facilitating the formulation of carbon reduction strategies and supporting 
the realization of carbon neutrality and emission reduction goals (Liu et al., 2019). Carbon accounting software 
empowers the building industry to efficiently collect and manage carbon emission data, providing support for 
achieving carbon neutrality and emission reduction objectives. 


Green certifications and rating systems. Some countries and regions have introduced green building certification 
and rating systems, such as LEED, BREEAM, and Green Building Labels. These systems comprehensively assess 
a building’s environmental performance, including carbon emissions, thus encouraging the adoption of more 
environmentally friendly design and operational practices in the construction industry. Participation in green 
certifications and ratings allows building projects to gain recognized environmental recognition, enhance market 
competitiveness, and contribute to sustainable development (Wang, Teng, et al., 2021). These green certification 
systems provide a clear goal for the building industry, driving the sector towards a more environmentally friendly 
and low-carbon direction (Chang et al., 2016). By comprehensively applying these carbon tracking technologies 
and practices, the building industry can play a proactive role in addressing global climate change and collectively 
create a more sustainable and greener future. 


2.2 Advantages, beneficiaries and limitations of carbon tracking 


This subsection provides an in-depth examination of the advantages and limitations of different carbon tracking 
technologies used in the building sector. The technologies discussed include LCA, Energy Modeling, Building 
Operation Monitoring, Carbon Accounting Software, and Green Certifications and Rating Systems. Each 
technology’s advantages and limitations are summarized for comparison. The beneficiaries are also analyzed to 
gain a comprehensive understanding of the benefits in sustainable and low-carbon building practices. 


Table 1: Advantages, beneficiaries, and limitations of carbon tracking technologies. 


Carbon tracking tech. Advantages Beneficiaries Limitations 
© Comprehensive and holistic @ Building owners and © Data-intensive and time- 
LCA . 
approach developers consuming 
@ Enables optimized design and © Government agencies and @ Dependent on data quality and 
material selection regulatory authorities availability 
@ Environmental organizations 
and NGOs 
@ Virtual simulations for energy @ Architects and engineers @ Relies on input assumptions 
Energy Modelin: ; 
ey 8 efficiency @ Energy managers and model accuracy 
@ Allows iterative design @ Real-world performance may 
improvements differ from predictions 
@ Provides real-time data on @ Building owners and © Requires infrastructure of 
Building Operation . 
a energy consumption developers sensors and data systems 
Monitoring . ; 
@ Identifies energy wastage and @ Energy managers @ Interpretation may need 
reduction potential specialized expertise 
@ Streamlines data collection @ Building owners and © Choosing appropriate 
Carbon Accountin z : 
g and analysis developers software can be challenging 


Software 
@ Supports progress tracking @ Government agencies and ©@ Reliability depends on data 
accuracy and completeness 


and mitigation strategies regulatory authorities 


© Offers standardized @ Building owners and © Time-consuming and 


Green Certifications and : : 
resource-intensive 


Rating Systems 


benchmarks for sustainability 


Incentivizes eco-friendly 


developers 


Environmental organizations 


certification 


and NGOs @ Potential gap between 


predicted and realized 


design and practices 


outcomes 


LCA’s strength lies in the comprehensive assessment capability, which takes into account the environmental impact 
ofthe life cycle stages of the building. LCA provides comprehensive carbon emission data and other environmental 
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indicators and helps decision makers to fully understand the environmental performance of the building. Through 
LCA, builders can compare the environmental performances of different design and material options to make more 
informed choices, optimize building design and material choices, reduce carbon emissions and environmental 
impact, and achieve sustainability goals (Hong et al., 2015). However, LCA also has some challenges and 
drawbacks. Its complexity is a significant problem. The implementation of LCA is complex and costly, because it 
requires a large amount of data collection, analysis and calculation, and has high technical and professional 
requirements. In addition, the reliability of LCA depends on the quality of the data and the reliability of data source. 
Therefore, incomplete or missing data can lead to uncertainty in the results and misled decisions. At the same time, 
conducting LCA also requires a significant investment of time and resources, which may become impractical or 
difficult to apply to complex construction projects. 


Energy modeling can provide architects and energy experts with comprehensive data and information to gain 
insights into a building’s energy use. With the simulation results of a building’s energy consumption under different 
conditions, decision makers can predict a building’s energy performance and make environmentally friendly and 
energy efficient decisions at the early design stage. Secondly, energy modeling integrates factors such as building 
design, energy system and climate conditions to provide solutions for building projects to optimize energy 
efficiency and reduce carbon emissions. Therefore, energy modeling can reduce energy costs and environmental 
impact throughout the life cycle (Li & Chen, 2017). However, energy modeling’s reliability depends on the 
accuracies of both input data and the building model. Inaccurate data or modeling can lead to biased results and 
ineffective decisions. Secondly, energy modeling requires a high level of technology and expertise, and there may 
be barriers to learning and application for some builders. In addition, energy modeling also requires a certain 
amount of time and resource investment, especially for complex construction projects, which may increase the 
difficulty and cost of implementation. 


Building operations monitoring provides real-time and accurate energy consumption data to help building 
managers get a complete picture of a building’s energy use. Through continuous monitoring, managers can 
immediately grasp the energy consumption of the building, find potential energy waste and problems in time, and 
provide a basis for taking targeted measures. Secondly, operational monitoring can help optimize a building’s 
energy use and operational strategies to achieve ongoing carbon reduction and energy conservation goals. By 
aligning operations with actual data, builders can reduce carbon emissions and energy costs and improve 
operational efficiency (Geng et al., 2022). However, the installation and maintenance of building monitoring 
systems may require certain inputs and costs. The selection, installation and commissioning of sensors and 
monitoring equipment require specialized technical support. Secondly, the processing and analysis of large 
amounts of real-time data may also require certain technical and management capabilities (Geng et al., 2022). For 
some builders, it may be necessary to train and improve the data analysis and operations management skills of 
relevant personnel. In addition, the building monitoring system also faces the problem of data security and privacy 
protection, and it is necessary to establish a reasonable data management and protection mechanism to ensure data 
security and compliance. 


Carbon accounting software enables efficient data collection and management. By automating data acquisition and 
processing, manual operation and time cost can be greatly reduced, and the accuracy and reliability of data can be 
improved. Secondly, carbon accounting software provides powerful data analysis and reporting capabilities, 
capable of turning complex carbon emissions data into intuitive charts and reports to provide clear insight and 
guidance for decision-makers (Long et al., 2018). This helps to develop carbon reduction strategies and track 
progress, driving the construction industry in a lower carbon direction. However, choosing the right carbon 
accounting software for your needs requires consideration of a number of factors, including the software’s 
functionality, compatibility, price, and user-friendliness. Different software may be suitable for different sizes and 
types of construction projects, so careful evaluation and selection are required. Secondly, it is necessary to ensure 
the accuracy and integrity of the data during the use of the software, otherwise, it may lead to errors and inaccurate 
analysis of the results. Therefore, the construction industry needs to establish a reasonable data collection and 
verification mechanism to ensure that the software output data is reliable and usable. 


The Green certification and rating system provides the construction industry with clear environmental standards 
and targets, driving the construction industry to adopt more environmentally friendly and sustainable design and 
operation practices. By participating in certification and rating, construction projects can receive recognized 
environmental recognition, improve their market competitiveness, and attract more environmentally conscious 
customers and investors. Secondly, the green certification and rating system takes into account the environmental 
performance of the building, including carbon emissions, energy efficiency, material use and indoor environmental 
quality, so as to achieve comprehensive environmental benefits and sustainable development (Ma et al., 2020). 
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However, some rating systems may be too complex and cumbersome, requiring large amounts of data and proof 
to meet certification standards, increasing costs and burdens for builders. Secondly, the certification and rating 
process can be time-consuming, affecting the schedule and operation of the project. In addition, sometimes the 
rating results may only reflect the design and planning stages of the building and do not actually take into account 
the actual operation and use of the building, and therefore may be biased from the actual environmental 


performance. 


3. THE PROPOSED CARBON TRACKING ‘CABBAGE’ FRAMEWORK 


Fig. 2 presents the proposed Carbon Tracking ‘Cabbage’ (CTC) framework. The architecture of CTC framework 
represents the characteristics and beneficiaries of all the five technologies. The integrated framework can enable 
builders and all stakeholders to get a comprehensive picture of the carbon emissions of buildings. The quantitative 
carbon tracking results also enables carbon reduction strategies and plans in a data-driven manner, driving the 
construction industry towards a greener and low-carbon direction in the digital era. Incorporating a multi- 
stakeholder engagement approach, the CTC framework fosters collaboration among diverse participants, including 
architects, engineers, policymakers, and environmental experts, creating a synergistic effort to address carbon 
emissions and advance sustainability within the construction industry. With integrated carbon tracking 
technologies and evidenced-based multi-stakeholder practices, the construction industry can play an active role in 
contributing to the global response to climate change and co-creating a more sustainable and green future. 
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Fig. 2: The proposed Carbon Tracking ‘Cabbage’ framework in this paper. 


3.1 Rationale 


Carbon tracking and management becomes distinctly robust and strategic based on the CTC framework. This 
comprehensive approach stems from the imperative to synergize diverse data streams, resulting in a panoramic 
understanding of an organization’s carbon footprint and the cultivation of a holistic strategy for sustainable 
practices. By seamlessly integrating techniques like LCA, Energy Modeling, Building Operation Monitoring, 
Carbon Accounting Software, and Green Certifications and Rating Systems, stakeholders attain a multifaceted 
perspective on their carbon emissions, uncovering insights that span the entire spectrum of product lifecycles. 


Energy modeling serves as a vital supplement to this perspective, shedding light on emissions linked to energy 
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usage, while real-time operational data from building operation monitoring introduces an agile layer of information. 
Carbon accounting software meticulously quantifies emissions, and green certifications provide benchmarks for 
measuring sustainability performance. The true power of integration lies in its ability to provide a nuanced analysis 
of emissions' sources and potential reduction avenues. This comprehensive comprehension empowers 
organizations to precisely identify processes or lifecycle stages that contribute significantly to the carbon footprint, 
facilitating the strategic alignment of reduction efforts with broader sustainability objectives. 


Furthermore, the incorporation of real-time monitoring and reporting, seamlessly facilitated by building operation 
monitoring and carbon accounting software, bestows organizations with the nimbleness to make swift, well- 
informed decisions. The trajectory towards carbon reduction objectives can be closely monitored, enabling agile 
adjustments in real-time that optimize the efficiency of sustainability initiatives. The integration of esteemed green 
certifications and rating systems augments the credibility of these endeavors, ingraining transparency and 
accountability within the organizational approach to sustainability. In essence, this fusion of techniques 
underscores a sagacious and pragmatic approach to surmounting the intricate challenges of carbon reduction, 
laying the essential groundwork for a verdant and more sustainable future, fortified by data-driven insights and 
strategic harmony. 


3.2 Adoption strategies 


The CTC framework provides actionable strategies for stakeholders to adopt in order to achieve their carbon 
reduction goals. By facilitating data integration and collaboration across departments, organizations can ensure a 
seamless flow of information from technologies such as LCA, energy modeling, building operations monitoring, 
carbon accounting, and green certification. This collaborative approach promotes a comprehensive understanding 
of carbon emissions, enabling informed decision-making and targeted mitigation efforts. The implementation of 
advanced technology solutions has become a key strategy. By investing in carbon accounting software, energy 
management systems, and IoT devices for real-time monitoring of building operations, organizations can improve 
their ability to accurately quantify emissions and optimize energy use in a timely manner. A continuous 
improvement cycle is essential, where real-time monitoring data informs regular review and analysis to identify 
trends and areas for improvement. Over time, this iterative process enhances carbon reduction strategies and 
optimizes operations. 


Transparently communicating efforts to reduce carbon emissions to stakeholders demonstrates a commitment to 
sustainability, while incentive programs and recognition boost employee motivation and morale. Long-term 
strategic planning ensures that organizations remain adaptable and resilient in their carbon reduction initiatives. 
Overall, the CTC framework serves as the foundation for implementing these strategies, guiding organizations 
toward a more sustainable and environmentally responsible future. 


3.3 Application scenarios of the CTC framework 


The CTC framework can guide sustainable construction management in different phases of a construction. For 
example, the framework integrates LCA and energy modeling, enabling design teams to assess the environmental 
impact of different design options. This enables the selection of materials, systems and technologies that meet 
carbon reduction targets, creating a solid foundation for sustainable projects. As the construction phase begins, the 
CTC framework maintains its importance. The integration of real-time monitoring and data-driven insights gives 
construction teams the means to track energy consumption and emissions in real time. During construction, rapid 
interventions can be implemented to optimize energy use and reduce carbon emissions, reflecting the immediate 
utility of the framework. 


During the operational phase, the framework facilitates continuous monitoring and ensures the continuous and 
efficient operation of construction projects. Building operation monitoring ensures sustainable performance, while 
carbon accounting software tracks ongoing emissions. In addition, a recognized green certification and rating 
system is used to validate and communicate the sustainable achievements of the project to stakeholders, promoting 
transparency and accountability. Finally, during the retrofit and retrofit phases, the CTC framework guides 
informed decision-making by assessing the carbon emissions impact of retrofit choices. By seamlessly integrating 
LCA and energy modeling, retrofit teams can strategically upgrade systems, materials, and technologies to achieve 
the best carbon reduction outcomes. 


All in all, the CTC framework serves as a comprehensive and adaptable tool for effective and sustainable 
construction management at all stages of a construction project. By leveraging its integrated technology, 
stakeholders are able to make informed, data-driven decisions that align with carbon reduction targets, advance 
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environmentally responsible building practices, and contribute to a greener future. 


4. DISCUSSION 


As carbon emissions continue to be the primary driver of climate change, tracking and mitigating carbon emissions 
has become a top priority for governments, organizations and industries around the world. The construction 
sector’s significant contribution to global carbon emissions underscores the need to focus on reducing the carbon 
footprint of buildings throughout their life cycle. Carbon tracking becomes a powerful tool for measuring, 
monitoring and mitigating the carbon impact of buildings. By tracking and monitoring carbon emissions, 
policymakers gain valuable insights into the amount of carbon in the atmosphere, enabling targeted mitigation and 
climate change strategies. Carbon tracking also helps to set clear and achievable emission reduction targets and 
promotes global cooperation and transparency to achieve common climate goals. 


In addition, it is important to standardize carbon tracking and reporting methods to ensure consistency and 
comparability in international Settings. These technologies enable businesses to measure their carbon footprint, 
identify opportunities for improvement, and demonstrate to stakeholders their commitment to environmental 
responsibility. In the construction sector, carbon tracking technology provides a comprehensive understanding of 
the carbon emissions of buildings throughout their life cycle. From construction through operation to final 
demolition, these tools provide insights for informed decision-making and promoting sustainable building 
practices. Throughout the inquiry, the focus remained on the development and advancement of carbon tracking 
technology. From traditional methods to cutting-edge digital solutions, these technologies continue to improve our 
ability to quantify and understand carbon emissions, making them indispensable in our fight against climate change. 


Overall, carbon tracking technologies enable stakeholders to make informed choices, assess the environmental 
impact of their actions, and work together to build a more sustainable and resilient future. By providing valuable 
data and insights, these technologies are transforming the construction industry and moving us closer to a carbon- 
neutral and healthier planet. 


5. CONCLUSION 


In summary, this comprehensive review highlights the significance of carbon tracking technologies in the building 
sector for combating climate change and promoting sustainable practices. The five key technologies—LCA, 
Energy Modeling, Building Operation Monitoring, Carbon Accounting Software, and Green Certifications and 
Rating Systems—each play essential roles in understanding, monitoring, and reducing carbon emissions in 
buildings. LCA provides a comprehensive view of a building’s carbon footprint across its entire life cycle, guiding 
sustainable design and material choices. Energy Modeling allows for energy consumption simulation and 
optimization during the operational phase, enabling the early implementation of energy-efficient strategies. 
Building Operation Monitoring offers real-time data collection to understand and reduce energy wastage. Carbon 
Accounting Software tracks emissions and sets reduction goals, while Green Certifications and Rating Systems 
incentivize sustainable practices. 


A proposed Carbon Tracking ‘Cabbage’ (CTC) framework integrates all the technologies and empowered 
stakeholders for data-driven decision-making, carbon reduction strategies, and sustainability initiatives. 
Collaboration among researchers, policymakers, and industry professionals, as sketched in the CTC framework, 
is crucial for implementing this framework and achieving a greener and more sustainable future. By embracing 
carbon tracking technologies under the CTC framework, the building sector can actively contribute to mitigating 
climate change and creating a resilient and environmentally responsible world. 


However, the CTC framework in this paper is conceptual. We require a system platform and pilots in the industry 
to realize and validate the effectiveness of the CTC framework. Furthermore, the arguments and discussion only 
flows at the surface of general cases. Fine adjustments are necessary for each construction ‘niche’ industry to meet 
construction industrial standards, supply chains, and cultures for carbon tracking of the whole life cycles of 
buildings. 
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ABSTRACT: A large body of research has been developed with the aim of assisting policymakers in setting 
ambitious and achievable environmental targets for the retrofit of current and future building types for energy- 
efficiency and in creating effective retrofit strategies to meet these targets. The aim of this research is to conduct a 
comprehensive study to identify the relationship between building type and sustainability, with a particular 
emphasis on retrofitting and try to identify research gaps in the most effective energy-saving strategies for 
retrofitting various types of buildings. In this regard, this study conducts a systematic literature review (SLR) 
utilizes artificial intelligence (AI) and natural language processing (NLP). Sixty relevant papers are selected and 
reviewed, establishing a comprehensive searching scheme. The research highlights retrofitting strategies for 
improving energy efficiency in buildings and discuss the limitations of current practises in terms of physical and 
technical developments, such as utilising new energy systems and innovative retrofitting materials. To overcome 
these, future studies could focus on in-depth building classification, developing tailored retrofitting alternatives, 
and establishing an adaptive solution framework. This framework aligns cohesively with diverse typologies, 
adapting to changing contexts and enhancing long-term performance. 


KEYWORDS: retrofitting, typology of building, building energy performance, residential buildings. 
1. INTRODUCTION 


Buildings account for 40% of the overall energy consumption in the European Union (Ballarini et al., 2017). 
Improving building energy efficiency is currently considering a top priority by the UK government as a major 
initiative for accelerating the decarbonization agenda for the building industry by 2050. European policy aims to 
achieve a 27% increase in energy efficiency by 2030, primarily by improving the energy efficiency of newly 
constructed buildings. However, the number of new buildings is small compared to the total stock of buildings in 
Europe, accounting for only 1%. Therefore, the most crucial aspect of energy-saving in Europe is retrofitting of 
existing residential buildings (Pungercar et al., 2021). Nevertheless, according to (Ortiz et al., 2020), the UK 
government's main barrier in this regard, tends to be reducing carbon emissions from existing residences. To 
improve the long-term energy performance of the building stocks and reduce carbon emissions, governments 
should develop a strategy to invest in building energy refurbishment(Ballarini et al., 2017). A large body of research 
has been developed to assist policymakers in setting ambitious and achievable environmental targets for converting 
a certain building type to energy-efficient structures and creating effective strategies to meet these targets (Re 
Cecconi et al., 2022). However, building regulations are frequently changed, depending on each country's vision, 
potential, capacity to implement such changes, and the complexity of architectural details and conditions within 
its building stock (Alabid et al., 2022). 


Several variables play a pivotal role in shaping energy consumption within a building, including the building 
envelope's structure, age distribution among existing building stocks, prevailing climate conditions, building area 
and type, the building's age, and the efficiency of its system installations (Beagon et al., 2020). In order to promote 
local or national energy-saving strategies, typical residential building typologies are commonly used to model the 
energy efficiency of building portfolios (Loga et al., 2016). 


Indeed, one crucial aspect that contributes to the complexity of retrofitting residential buildings lies in the fact that 
each building's characteristics can significantly vary based on the environmental conditions of its location. While 
previous research, as highlighted by (Kadri¢, Aganovic, Martinović, et al., 2022), has delved into the challenges 
and opportunities of retrofitting different building types, there remains a notable gap in the literature concerning 
the explicit consideration of environmental factors during the retrofitting process. This research aims to address 
knowledge gaps by utilizing a novel searching framework that employs an AI algorithm. It seeks to analyse the 
existing literature concerning the correlation between building types and energy-efficient retrofitting, including 
the influence of environmental factors on energy-saving strategies according to building’s typology. The goal is to 
identify crucial areas for future research and enhance the understanding of the relationship between building 
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SECTION D - ENVIRONMENTAL SUSTAINABILITY 


typology, energy efficiency, and retrofitting. 


2. RESEACH METHODOLOGY 


Highlighting the most recent developments in many areas of research is essential to ensuring progress and 
innovation in those areas. However, with an overwhelming number of publications, it becomes challenging to 
thoroughly read and analyze each one. Ignoring them entirely is not a viable option either, as valuable insights 
might be missed. Therefore, there is a pressing need to develop a new search scheme that effectively filters 
publications, ensuring a comprehensive review without overlooking significant contributions. The methodology 
employed in this research involves a multi-step approach to ensure a comprehensive and robust review of relevant 
studies (Figure 1). 


The research begins by conducting a SLR proceess and develop an algorithmic gap spotting framework, which 
serves as a fundamental aspect of the process. This framework encompasses the formulation of effective search 
strategies and the establishment of stringent study selection criteria. By implementing this approach, the research 
aims to identify and address gaps in existing literature, enhancing the overall quality of the review. Following the 
development of the framework, the quality of the studies included in the review is thoroughly evaluated. This 
evaluation focuses on determining the probability of bias and assessing the reliability of the supporting data. By 
conducting this assessment, the research ensures that only high-quality studies are considered in the analysis, 
enhancing the credibility and validity of the findings. Based on the results obtained from the algorithmic gap 
spotting framework and study evaluation, this research aims to identify gaps in the existing literature and design 
future study. These gaps indicate areas where further investigation is needed to address unanswered questions or 
explore novel perspectives. By identifying these research gaps, the study aims to contribute to the advancement of 
knowledge in the field. 
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Fig. 1: Research methodology 


2.1 Systematic literature review 


To fully address a research topic, this study used (SLR) technique. The methodology employed in this publication 
for conducting (SLR) involves two main steps for defining keywords. Firstly, database selection was performed, 
and subsequently, a search strategy was developed. This process resulted in the identification of 402 relevant 
publications that were then selected for evaluation and analysis. The systematic literature review (SLR) process 
identified 60 relevant papers using PRISMA methodology (Figure 2). This approach provided valuable insights 
into energy efficiency, particularly building typology's role, crucial for decision-makers and designers. 
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Fig. 2: Search strategy framework 


2.2 Algorithmic gap spotting 
The research questions addressed in this section are: 


RQ1. How much publications have there been about relationship between building type and energy efficiency 
through retrofitting since 2011? 


RQ2. What are the limitations of current research in this area? 


In order to answer the research questions, this study reviewed literature related to construction building technology, 
civil engineering, green sustainable science technology and environmental science fields to catch the most relevant 
articles. This study introduces a novel approach to addressing the challenge of search strategy for identifying 
research gaps in existing literature. By leveraging AI algorithms, NLP techniques, and data analysis, a strategy 
called algorithmic gap spotting (algorithmic gap roadmap) is employed. This method offers an automated and 
systematic way to identify areas of research or knowledge where there are gaps, enabling researchers to guide 
future studies, recognize biases and limitations, and foster innovation in various fields (Figure 3). 


Algorithmic gap spotting involves the utilization of computational tools to analyse and interpret large volumes of 
published research papers, articles, and other relevant documents. By applying AI algorithms and NLP techniques, 
patterns and trends within the data can be identified, such as keyword frequency, co-occurrence of terms, and the 
distribution of topics across different domains. 
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Fig.3: Algorithmic gap spotting design process 


3. RESEARCH FINDINGS AND REVIEW RESULTS 


As mentioned before, building retrofitting is an effective approach to reducing energy consumption and carbon 
emissions. However, it is a complex process that requires consideration of various factors. Despite its potential 
benefits, there is still a lack of information on factors that impact retrofitting solutions. This research aims to 
address these gaps by conducting a comprehensive literature survey. Reviewed publications in the previous section 
identified two main areas that remain a gap in the literature, which are building retrofit assessment according to 
the typology of building and building retrofit assessment according to environmental factors. This paper discusses 
the importance of addressing these gaps and presents recommendations for future research in these areas. 


3.1 Building retrofit assessment according to the typology of building 


Building typologies play an essential role in achieving energy performance requirements of buildings. By 
considering building typologies, a comprehensive understanding of a building stock's energy efficiency can be 
gained, making it an indispensable tool in ensuring sustainable and energy-efficient buildings (Y. Li et al., 2019). 


Numerous pieces of evidence have been identified through the analysis of architectural typologies related to energy 
in the European Union, at both national and regional levels. Typological data and criteria are being used to develop 
informational materials and provide energy advice for buildings (Dascalaki et al., 2011a). Moreover, typical 
residential building typologies are also being employed as tools for modelling the energy efficiency of building 
portfolios to promote local or national energy-saving strategies (Ballarini et al., 2011la). The main purpose of 
building typology is to determine the best energy-efficiency techniques to implement in existing structures and 
quantify the potential energy savings and CO2 emission reductions associated with the implementation of energy 
refurbishment measures in the building stock at various scales (Fernandez-Luzuriaga et al., 2021). 


According to (Sugar et al., 2020), building typology also plays a crucial role in determining a building's energy 
consumption. For instance, the heating energy demand of a building depends on its architectural style, and typology 
can be used to calculate a building's heating energy requirement. The main objective of this research is to present 
a study through literature related to the connection between sustainable retrofitting and building typology. The 
process systematic search method for retrofit decision-making intends to provide thought-provoking insights into 
the shortcomings and outlines the most important directions for future research. 


3.2 Building retrofit assessment according to Environmental factors 


The influence of the surrounding environment on building heating energy consumption has been recognized as a 
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critical factor in addition to the physical condition of a building. While the latter factors have been extensively 
studied, (Song et al., 2020) highlights the importance of urban morphology and climatic conditions in determining 
the overall heating demand in buildings. There are various global environmental assessment schemes that evaluate 
the impact of projects on different factors related to sustainability (Del Rosario et al., 2021). This section tries to 
investigate and comprehend the environmental aspects that influence the energy efficiency of buildings. It 
highlights a requisite for additional research to advance more precise and comprehensive assessment frameworks 
that encompass the environmental sustainability of retrofitting strategies. 


3.2.1 Building retrofit assessment according to the climate conditions 


The classification of buildings varies depending on the climate condition of the region and retrofitting strategies 
must be tailored to specific climate conditions and building types to ensure their effectiveness (Boardman, 2007). 
The primary aim of building typology is to create structures that are responsive to their environment while 
maximizing the use of resources available (Kirkegaard & Foged, 2011) (Tompkins & Adger, 2003). Retrofitting 
strategies towards energy-efficient buildings in specific climate conditions may have a common target; however, 
they differ in their strategies. Numerous studies conducted in various locations highlight retrofitting 
solutions tailored to their specific climate conditions, which might not be applicable to other regions. 


The decision-making process for retrofitting buildings can be significantly impacted by the availability of 
retrofitting alternatives that are specially created for various climate zones (Liu et al., 2022). To increase the 
adoption of energy-efficient retrofitting solutions and reduce greenhouse gas emissions, it is advised to develop 
and promote options tailored to local climatic conditions, while considering the typology of the building. By 
adopting climate-specific retrofitting strategies, energy efficiency can be significantly improved by taking into 
consideration the unique weather conditions of a region. 


3.2.2 Building retrofit assessment according to the surrounding environment 


Besides climate factors, there are other environmental factors that are essential for reducing greenhouse gas 
emissions and improving energy efficiency in cities (Bouw et al., 2021). For instance, the architect must consider 
solar and daylight availability to optimize solar energy production and minimize environmental impact in design. 
By integrating sustainable and passive design solutions with active solar energy systems, cities can reduce their 
reliance on non-renewable energy sources and promote a more environmentally sustainable future (Webb et al., 
2016). Many energy models have been developed recently, but they tend to neglect the importance of phenomena 
that occur at the urban scale, such as the effect of urban geometry on energy consumption (Mirzabeigi & Razkenari, 
2022). 


In conclusion, the surrounding environment and building design that considers solar and daylight availability are 
crucial factors in reducing greenhouse gas emissions and improving energy efficiency in buildings. While various 
energy models have been developed, they often neglect the impact of the surrounding environment on energy 
consumption. Therefore, it is important to consider factors such as building height, the density of the building in 
urban design, shape factors and etc, to optimize energy usage and minimize environmental impact. By integrating 
sustainable and passive design solutions with active solar energy systems, buildings can reduce their reliance on 
non-renewable energy sources and promote a more environmentally sustainable future. Overall, these findings 
emphasize the need for integrated approaches to urban planning and building design that prioritize environmental 
sustainability and energy efficiency. 


4. DISCUSSION OF RESULTS 


In order to determine the limitations and contributions of building retrofit assessments with regard to building 
typology and environmental factors, a review of the relevant literature was conducted in this research. This 
research aims to identify and evaluate the most relevant publications concerning building retrofitting assessments, 
with a specific focus on their respective key areas. The methodology involves the selection of 60 publications, 
followed by a thorough analysis of their contents. In this section, this paper selects 27 publications that align 
closely with its goals and methodology, and delves into their assessment methods (see Table 1). In addition, this 
paper thoroughly assesses the selected publications, examining their typological and geometrical parameters as 
evaluated in these studies. Furthermore, it considers other parameters such as the exploration of different climate 
conditions, cost analysis, CO2 emissions, as performance metrics and various retrofitting alternatives. These 
factors are crucial to understanding the assessment of achieving low-energy retrofitting in residential buildings. 
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Table 1: Summarised Publications for Building Retrofit Assessment based on Focused Parameters 
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The findings indicate that most of the publications propose retrofitting alternatives as solutions to improve energy 
efficiency and sustainability in buildings. However, there remains a lack of understanding regarding various 
metrics and calculation methods for evaluating retrofitting solutions, potentially hindering the development of 
standardized and readily available approaches. The studies identify that no single retrofitting solution can address 
all challenges, while also raising awareness about the importance of adoptable retrofitting strategies. 


As observed in Table 1, the majority of the publications focus on implementing energy-saving techniques based 
on specific climate conditions and personalized retrofitting options. Nevertheless, it is crucial to recognize that 
certain limitations still require attention and resolution. Limitations in current liturature are outlined below: 
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Several studies have been conducted, particularly in the last decade, to assess the energy efficiency of 
dwellings and improve building retrofitting according to their typologies. Although, most have used the 
"Typology Approach for Building Stock Energy Assessment" TABULA report to provide methods for 
classifying housing typology in Europe. However, there is still a lack of comprehensive information 
regarding technical developments, building structures, building layouts, and their relations which could 
affect energy requirements in buildings. 


TABULA report offers two levels of refurbishment for each typology, usual and advanced refurbishment. 
According to these refurbishment techniques, there is only one recommendation for each level but no 
elaboration on other refurbishment alternatives for each typology. 


The physical characteristics of buildings are regularly and significantly altered over time, changing not 
only the parameters of urban areas but also their physical characteristics and compositions. Therefore, 
there is a need to develop and amend adoptable retrofitting solutions based on building typologies. 


The energy-efficient building design methods have limitations when applied in regions with diverse 
climate characteristics. The method may not account for microclimatic conditions, extreme weather 
conditions or changes in building usage or occupancy patterns. It may not be suitable for regions with 
different building types or sizes and may require extensive data collection and processing. 


Numerous methods and tools have been created globally for assessing the energy efficiency of buildings. 
However, each of these methods and tools is different in its own way, and there is no consensus on how 
to score or weigh them. Furthermore, there is a shortage of building environmental assessment methods 
for retrofitting stages and approaches for determining carbon emissions and benchmarking are not 
consistent. 


5. FUTURE STUDY DESIGN 


In order to address the limitations observed in current liturature and develop more effective energy-efficient 
building retrofitting, a comprehensive approach is proposed for future studies. 


To address the lack of comprehension regarding the intricate connections among building layouts, 
technical progress, and energy demands, an in-depth building classification can be undertaken. This 
entails not only categorizing building typologies but also delving into the details of their structural 
compositions, architectural designs, and evolving technological aspects. 


Expanding on the predefined retrofitting solutions, a future study could focus on developing retrofitting 
alternatives tailored to various building typologies. This could involve an in-depth exploration of 
alternative refurbishment techniques precisely suited to specific building typologies. By considering an 
array of innovative materials, construction methods, and emerging technologies, researchers could 
propose retrofitting strategies that cater to the unique characteristics of each typology while also 
optimizing energy efficiency and sustainability. This approach would provide a richer set of retrofitting 
solutions for architects, designers, and stakeholders to choose from, ensuring a more adaptable and 
effective retrofitting process that aligns with diverse building needs and environmental contexts. 


To propose an adaptive retrofitting solution framework that aligns cohesively with diverse building 
typologies and could respond to their evolving physical features or environmental contexts. By 
incorporating advanced assessment methodologies, responsive strategies, and a unified assessment 
framework, the proposed adaptive retrofitting solutions aim to enhance the long-term performance and 
environmental compatibility of buildings. 


6. CONCLUSION 


Building retrofitting assessments have recently gained a lot of attention from researchers. Since 2019, the number 
of published works on this topic has increased significantly. The goal of this study is to thoroughly examine the 
available literature on the relationship between building typology and energy efficiency, with a particular emphasis 
on retrofitting. In addition, this study attempts to identify research gaps and plan a future study on the most 
effective energy-saving solutions for retrofitting various types of buildings while taking specific environmental 
and physical aspects into account. Based on the review of journal articles (n = 60) between 2011 and 2023, this 
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study summarized: (1) building retrofitting, (2) energy efficiency improvement, and (3) building typology. The 
main findings of this study include the following: the current body of existing literature on building retrofits has 
primarily focused on classifying building typologies and tailoring retrofitting solutions accordingly. However, 
there is a significant gap in knowledge regarding technical advancements, building structures, environmental 
factors, and their relationship, as well as alternative strategies for executing standard or deep retrofitting and 
accurately predicting energy savings. The energy-efficient building design method based on climatic zoning has 
limitations when applied to diverse climates and building types/sizes, which necessitates the development of a 
comprehensive methodology and tool for transferring knowledge to support adoptable energy-efficient building 
retrofits. Addressing these deficiencies is crucial for developing responsive and adaptable solutions that are tailored 
to the unique characteristics of each building rather than relying on a generic approach. It would also facilitate 
designers and policymakers with relevant information on energy-efficient building retrofits to make informed 
decisions. 
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ABSTRACT: Building Information Modelling applied to civil infrastructure has opened up interesting scenarios 
for integrated management of existing infrastructural works. In the last few years Bridge Management System 
(BMS) have been increasingly used by infrastructure owners, based on different control systems: from stochastic 
methods, which make it possible to define a condition ratio (CR) starting from periodic inspections of bridges, to 
sensors for structural monitoring, which can originate a flow of information exchange between real artifacts and 
the digital model capable of activating effective reactive or planned responses in the operation and maintenance 
phase of the asset. 


The paper intends to outline a BIM-oriented process workflow, which from the creation of parametric objects for 
infrastructural works using Scan-to-BIM acquisition techniques and procedures, arrives at the implementation of 
information bridge models to manage both static data from scheduled inspections of technicians of defects and 
their severity according to specific guidelines, and dynamic data from incoming and outgoing sensors placed in 

the physical asset for real time monitoring towards analysis, supervision and control systems of the facilities owner. 

The defined process workflow will be applied to some case studies, related to bridges of different characteristics, 

outlining some directions for future developments. In detail the research showcases the tasks undertaken and the 
outcomes achieved on four selected bridge case studies, which are real and situated within the geographical area 
of the Tuscany region, Italy. The studied bridges are all still in use and hold historical significance, as they were 
constructed between two hundred and one hundred years ago. 


KEYWORDS: Bridge Management System; InfraBIM; HBrIM; Digital Twin, Scan-to-BIM, SHM, IFC. 


1. INTRODUCTION 


Generally, users perceive infrastructure projects as safe, and it’s uncommon for drivers of regular vehicles to doubt 
the safety of the bridge they're crossing (Santarsiero et al., 2021). However, factors like extreme environmental 
conditions, mechanical loads surpassing the design assumptions, extended operational durations, inadequate 
maintenance, and similar elements can significantly impact and jeopardize the structural integrity of bridges 
(Saback de Freitas Bello et al., 2022; Santarsiero et al., 2021; Zinno et al., 2022). 


After a series of incidents, including the Morandi Bridge collapse (Santarsiero et al., 2021), the Italian Government 
in 2020 enacted the “Guidelines on risk classification and management, safety assessment, and monitoring of 
existing bridges” through legislation (Ministerial Decree number 578/2020). The ministerial decree number 
204/2022 essentially reaffirmed the aforementioned guidelines, extending their temporal validity to forty-eight 
months, or until the end of the year 2024. Further complementing the Italian regulatory framework are the 
"Operational Instructions for the Application of the Guidelines for Risk Classification and Management, Safety 
Assessment, and Monitoring of Existing Bridges" proposed by ANSFISA and annexed to the ministerial decree 
number 204/2022. 


For the risks that older bridges run and for regulatory issues similar to those described above for the Italian case, 
in recent years, Bridge Management Systems (BMS) have gained much importance and numerous infrastructure 
management companies have adopted it. Particularly, those systems based on stochastic methods have gained 
prominence, allowing the determination of a Condition Ratio (CR) based on regular bridge inspections and on the 
detection of the defects of the bridges themselves. Bridge Management Systems (BMSs) are modular information 
systems with designated functions (de Freitas Bello et al., 2021; Woodward et al., 2001), including inventory 
compilation, preservation assessment, risk evaluation (including load capacity), operational management, cost 
estimation for maintenance strategies, deterioration prediction and associated costs, socio-economic importance 
analysis with budget constraints, maintenance priority setting, and multi-temporal budget tracking. BMSs, along 
with Structural Health Monitoring (SHM) techniques, are employed to assess bridges post-visual inspections by 
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specialized technicians. 


The majority of current BMSs employ a two-dimensional (2D) approach to record and store information without 
visual representation and dynamic integration (Li et al., 2023). Building Information Modeling (BIM), which 
involves creating and managing a comprehensive 3D model embedded with informative data for the entire 
lifecycle management of a specific asset (Kaewunruen et al., 2022), emerges as the most natural source of 
information and data storage for next-generation BMSs. 


In the field of the SHM it is now common practice to place sensors on the bridges that it is believed to have to 
check or monitor for what has emerged from inspections on the same. The type of sensors that can be installed 
varies according to the phenomenon to be monitored but also with respect to its purpose: it is possible to install 
different sensors for short, medium or long-term monitoring or to install sensors capable of sending an alarm when 
a certain threshold value is reached. Enhancing finite element method (FEM) analyses with continuous sensor data 
can bolster the reliability of studying degradation. Artificial Intelligence (AI), especially Machine Learning (ML), 
offers transformative potential for Structural Health Monitoring (SHM). ML techniques automate pattern 
recognition in sensor data, aiding defect detection and risk assessment (Malekloo et al., 2022; Zinno et al., 2022). 


Considering AI as a predictive technology that utilizes data and "experience" to formulate forecasts, which in turn 
serve as inputs for the decision-making process, it can be asserted that the initial solutions conceived and applied 
to date across various fields, including Structural Health Monitoring (SHM), have been “Point Solution” where AI 
has replaced previous predictive tools (Agrawal et al., 2018, 2022). However, it is reasonable to anticipate a 
substantial paradigm shift in the medium term within this sector, as well as others. Stemming from the concept of 
reducing the cost associated with forecasting, a principal economic facet introduced by contemporary Machine 
Learning (ML) algorithms, a complete reorganization of SHM is foreseeable. Referring to this form of Al-based 
solution as a “System Solution” is appropriate. 


The Italian Guidelines also envision the utilization of digital technologies for their "intelligent" administration, 
achieved by integrating sensors into SHM systems and constructing informative models of structures. This is 
regarded as a step towards the realization of the National Digital Archive of Public Works (AINOP). 


A digital model integrating geometric and performance data aids SHM and aligns with the "Digital Twin" concept 
of Smart Manufacturing and infrastructure research. In the AECO sector, the Digital Twin concept is tied to 
Building Information Modelling (BIM). In the realm of bridges, the term "BIM" is often replaced with: InfraBIM 
(Osello, 2019), BrIM (Barazzetti et al., 2016; Saback et al., 2022), HBIM (Barazzetti et al., 2016; Borin & 
Cavazzini, 2019; Leon-Robles et al., 2019; Murphy et al., 2011; Stavroulaki et al., 2016), HBrIM (Leon-Robles et 
al., 2019). 


In the field of built heritage, surveying plays a crucial role in comprehending structures. Contemporary techniques 
such as laser scanning(Boardman et al., 2018; Leon-Robles et al., 2019; Pritchard et al., 2017) and photogrammetry 
(loli et al., 2022; Jáuregui et al., 2006; Mohammadi et al., 2021) are widely employed, utilizing both ground-based 
tools and UAV systems. These methods generate datasets in the form of point clouds (PC), which then require 
further processing to create Building Information Models (BIM). Known as Scan-to-BIM, this process is well- 
documented in the literature(Croce et al., 2023; Roggeri et al., 2022; Sing et al., 2022; Wang et al., 2019). 


The surveys and consequently the point clouds can serve at different stages in the useful life of a bridge: 


e They can form the database foundation to create a BIM model if it doesn't already exist. 

e They can be work an AS IS representation of the structure. 

e Inthe capacity of AS IS, they can be used for comparisons with a BIM model representing a situation 
prior to the survey. 


When a Building Information Model (BIM) of an existing bridge is being developed, several factors come into 
play that strongly affect the modeling process. These factors include the bridge's characteristics, how easy it is to 
access the bridge for data collection, and the availability of detailed design plans or information about the bridge's 
original construction and its current state. All of these elements have a significant impact on how the BIM model 
of the bridge is created and how accurate and comprehensive it can be. The process of creating a Building 
Information Model (BIM) from scanned data is easier for bridges with massive components like masonry 
structures. On the other hand, this process is more complex for bridges made of metal trusses. The complexity 
arises because metal truss bridges consist of many intricate elements with edges and corners that are difficult to 
accurately capture using laser scanning and photogrammetry techniques. It's easier to model massive bridges using 
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scan-to-BIM methods compared to complex metal truss bridges due to the challenges of capturing detailed data. 


With the widespread adoption of BIM as a methodology for building modeling, the IFC data schema, Industry 
Foundation Classes, has gained importance. Leveraging the principles inherent in Object-Oriented Programming 
(OOP), IFC presents a novel shared language across the entire Architecture, Engineering, Construction, and 
Operations (AECO) sector. IFC is not a replacement for the language of technician draw nor an evolution thereof; 
rather, it serves as a schema enabling the transmission of comprehensive information about a construction. This 
logically structured data extends far beyond mere geometric representation. As described above, IFC has become 
an essential component of knowledge incorporated into all leading university programs within the sector. With the 
upcoming version of IFC, the schema will expand into the domain of infrastructure, which until now has been only 
partially representable or represented using non-conventional methods. 


The topics addressed by this research are relevant to Cluster 3 of the Horizon Europe 2021-2027 program and also 
dealt with in the 2021-2027 National Research Plan (PNR), in line with the objectives of Goals 9 and 11 of the 
2030 Agenda of United Nations Organization. 


2. MATERIALS AND METHODS 


In recent years, the intersection of engineering, digital technology, and heritage preservation has paved the way 
for innovative approaches in the study of historic structures. This research article presents a comprehensive 
examination of the digital documentation and structural monitoring of four distinct historic bridges. By employing 
Building Information Modeling (BIM) techniques, each bridge was meticulously captured in a virtual environment, 
enabling a detailed analysis of its architectural and structural features. Furthermore, this study explores diverse 
strategies for the digital representation of these historic constructions and the implementation of structural 
monitoring solutions. 


The preservation of historic bridges holds significant cultural, historical, and engineering value. Through the 
integration of BIM methodologies, these bridges can be accurately documented and analyzed, facilitating the 
development of effective strategies for their maintenance and conservation. The four selected case studies serve as 
tangible examples of this interdisciplinary approach, shedding light on the challenges and opportunities that arise 
when dealing with the intricate balance between preserving heritage and ensuring structural integrity. By 
examining the challenges and successes encountered in the digital documentation and structural monitoring of 
historic bridges, this study aims to inform best practices and inspire further advancements in the field. 


Fig. 1, 2: Masonry bridge over Masera Ditch | Laser scanner survey. 


2.1 Case Study 1: Masonry Bridge over the Masera Ditch 


The masonry bridge over the Masera Ditch is situated in the Crespino over Lamone-Biforco section of the railway 
line connecting Borgo San Lorenzo to Faenza. The “Faentina” Railway is a state-owned railway line that connects 
Florence to Faenza via Borgo San Lorenzo (fig. 1). Its construction took place between 1880 and 1893, with the 
idea originating as far back as the 1840s. This railway was closed for an extended period due to significant damage 
sustained during World War II. Traffic resumed partially in the 1950s, but the line experienced another closure in 
1971. The gradual reopening began in the 1990s and was completed in the early 2000s. Essentially, it’s a sparsely 
utilized line, hence its lack of electrification and the presence of a single track. 
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The bridge over the Masera Ditch is a masonry structure with five arches, situated on a curved path. At its highest 
point, the structure stands 40 meters above the ground. Adjacent to the bridge, both upstream and downstream, are 
two galleries manually excavated from the rock. 


2.1.1 | Documentary research and survey campaign 


No original bridge designs have been found, nor have any other technical drawings of any kind been located. Near 
the bridge, there is an access point to regular road traffic, and it’s also possible to ascend the mountain to reach 
little clearings at the bridge's height, away from the rail track. Given the scenario described in the preceding 
paragraph, a significant portion of the survey was conducted without the presence of surveillance personnel. 
However, this did involve a temporary suspension of railway traffic. 


The survey campaign was conducted using a Z+F IMAGER 5016 Laser Scanner, which also captured RGB data 
(fig. 2). Additionally, photographic documentation was captured using a Sony H300 Camera with 35x optical zoom. 
The point cloud datasets were captured in .fls format and subsequently imported into Autodesk Recap® software. 


Fig. 3, 4: VPL script for arches modelling | Views of the BrIM model 
2.1.2 BIM Modeling and Implementations 


Given the absence of technical drawings for the bridge, in the case study of the Masonry Bridge over the Masera 
Ditch, a BIM modeling was carried out using a conventional Scan-to-BIM workflow. 


The initial step taken towards modeling the viaduct involved refining the provided point cloud data. This was 
achieved by removing portions of the surrounding vegetation and segmenting the point cloud into three parts, 
aimed at reducing the file size to expedite the modeling process. Subsequently, these three segments of the point 
cloud were integrated into Revit to initiate the modeling process. The bridge has been decomposed into its 
constituent elements based on the methodologies employed by the management company of the Italian railway 
line, similarly to what was done for the case study of the metal girder on the Osa river. The three segments of the 
point cloud were successively integrated into the Revit environment, and reference grids were generated from 
them. These grids were then utilized to position individual components in subsequent stages. 


Chronologically, the first elements that were modeled are the viaduct piers. For the BIM component describing 
the piers, the parameters that have been made parametric include the height, dimensions of the upper rectangular 
base, and the various inclinations characterizing the short and long sides of the pyramid. Based on these dimensions, 
the dimensions of the lower rectangular base were then defined. Furthermore, the material with which the piers 
were constructed has been made parametric and based on observations of images and the point cloud, masonry 
was chosen as the material. The arch surface was generated using the Dynamo platform, as the represented form 
exhibits a complex double curvature that is challenging to model otherwise (fig. 3). Initially, commands were 
implemented in Revit to select the edges on which the arch relies. Subsequently, median points of the segments 
were placed to create a central line. Alongside two other lines positioned at the ends of the selected segments, these 
components facilitated the creation of the arch surface. 


The subsequent step involved establishing control points on the previously generated lines. Each control point was 
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defined with its respective height relative to the arch springing level and the setback. Following this, a curve was 
constructed. Finally, after executing these procedures for the three segments that define the curve, the surface was 
generated and assigned a thickness. Ultimately, the thus-formed geometry was incorporated into the Revit project. 
The abutment and backfill were modeled using the "Generic Models" category. Both components were initially 
modeled as solid forms, and subsequently, the shape of the arch was subtracted using an empty arch model. For 
these elements, in addition to the parameters defining their shapes, a material attribution parameter was introduced. 
For the abutment, the material is masonry, while for the backfill, it is railway crushed stone (fig. 4). 


2.2 Case Study 2: Giorgini Bridge over the Bruna River 


The presented case study revolves around the "Giorgini Bridge" in Castiglione della Pescaia (Grosseto, Tuscany 
Italy), build from 1827 to 1828 as part of the Maremma reclamation works (fig. 5). Designed by mathematician 
and civil engineer Gaetano Giorgini, the bridge was constructed across the Bruna River. The bridge featured three 
floodgates, aimed at preventing the intermingling of Bruna River's freshwater with the saline seawater. The 
construction, initiated in 1827, aimed to address the belief that this mingling caused malaria, belief of the time 
which later proved to be scientifically unfounded. The bridge, 26 meters wide and 12 meters high, is composed of 
lateral shoulders, lowered round arches, and pylons. Positioned between these pylons were three floodgates 
enclosed in oak and framed with metal. These floodgates rotated on iron pins, closing manually or automatically 
via high-tide currents, preventing seawater intrusion into the marsh. During low tide, the force of the lake water 
facilitated the opening of the floodgates, discharging water into the sea. 


Fig. 5, 6, 7: View of Giorgini Bridge | Digital photogrammetry from drone | Views of the BrIM model 
2.2.1 Documentary research and survey campaign 


The investigation involved multiple methodologies: a laser scanner was utilized for acquiring geometric data, and 
a drone facilitated a photogrammetric survey. This was further supplemented by a topography network (polygon) 
for survey phases. The approach encompassed a sequence of steps, starting with a total station and GPS survey, 
followed by drone-based photosets creation, and concluding with the generation of a point cloud through a laser 
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scanner. 


The instruments employed in the survey campaign are: a SOKKIA SET3 130-R3 total station, a TOPCON GR-3 
GPS receiver, optical prisms, a DJI 3 Phantom UAV, and a FARO FOCUS CAM2 laser scanner. The drone flight 
was organized using the Altizure® application, enabling the definition of the GPS-coordinated flight path. 
Subsequently, the photographic dataset was post-processed with Agisoft PhotoScan® software to generate a point 
cloud. Meanwhile, the laser scanning dataset underwent processing using Autodesk Recap® software (fig. 6). 


The comparison in terms of “deviation of the cloud of points”, to evaluate the difference between the two clouds 
generated using different data acquisition technologies, was performed with Cloud Compare open-source software. 
The original design drawings were retrieved, and their accuracy was subsequently verified during the BIM 
modeling process. This validation was achieved by comparing them with the point clouds obtained from the 
previously described survey campaign. 


2.2.2 BIM Modeling and Implementations 


To develop the H-BIM model representing this unique historical infrastructure, a semantic deconstruction was 
required. This involved defining individual BIM components ranging from primary structural elements to intricate 
detailing components (fig. 7). 


Both point clouds were imported into the chosen BIM modeling software, Autodesk Revit®. By concurrently 
utilizing the point clouds and the original design drawings, each component of the structure was meticulously 
modeled one by one. The metal components of the bridge were primarily modeled using the original design plans, 
whereas the masonry components were predominantly elaborated based on the point clouds. 


2.3 Case Study 3: Metal Truss Bridge over the Osa River 


The subsequent case study pertains to a dual-track metal truss railroad bridge that dates back to the early 1900s. 
This bridge remains operational, spanning the Osa River in Italy as part of the Albinia-Talamone railway section 
on the Rome-Grosseto line, which is managed by RFI (Rete Ferroviaria Italiana), an Italian railway company (fig. 
8). The metal bridge boasts a 42-meter span and features abutments constructed using a composite material of 
masonry and concrete. Notably, a diverse array of components deviating from standard commercial metal profiles 
can be observed. These elements consist of a combination of "plate" profiles, each implemented with distinct 
configurations, and they incorporate supplementary reinforcement plates in the areas subjected to the highest stress 
levels. The connections between disparate components were forged using plates and secured with hot-riveted nails. 


AAPAN TAN 


Fig. 8, 9: View of metal bridge over OSA River | Perspective view of the BriIM model 
2.3.1 Documentary research and survey campaign 


The original design documentation for the metal bridge has been retrieved, comprising seven technical drawings. 
Additionally, there are documents pertaining to the materials used in the bridge, the conducted load tests, and other 
activities related to the construction phases. The survey campaign was executed utilizing the Faro® Focus 3D x 
330 Laser Scanner. The survey operations were meticulously scheduled to coincide with intervals when train traffic 
was absent. At the time of the survey, the riverbed exhibited cleanliness and optimal condition. The point cloud 
datasets were captured in .fls format and subsequently imported into Autodesk Recap® software. This software 
facilitated the alignment procedures for the distinct scans acquired during the campaign. 
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2.3.2 BIM Modeling and Implementations 


The bridge has been conceptually broken down into constituent elements following the procedures utilized by the 
management company of the Italian railway line. Given the presence of detailed original designs, the BIM 
modeling of the various bridge components was based on these drawings. The point cloud obtained as a result of 
the survey campaign was used as an "AS IS" comparison for the modeled components. The comparison was 
possible with the Cloud Compare open-source software (fig. 9). 


The chosen modeling workflow was partly influenced by the availability of the original technical drawings and 
partly dictated by the complexity of the metal truss. This truss comprises a high number of components that include 
edges, vertices, perforated elements, and other geometric intricacies, making its survey using a Laser Scanner 
challenging and difficult, along with the subsequent generation of a point cloud representation. The software used 
for BIM modeling was Autodesk Revit®. Given the uniqueness of the bridge components, a family was created 
for each type of element, and a report was produced for each of them. 


The concept underlying the breakdown into "elements" logic, as provided by the Italian railway company, aligns 
with the D.O.M.U.S. Software. This software aids maintenance engineers in the railway infrastructure sector in 
evaluating the condition and preservation status of bridges. A dedicated master dataset is employed for cataloging 
both the structures and their identifiable defects. These defects are linked to corresponding indices, facilitating the 
algorithm in forming assessments. 


During the inspection visit, engineers identify and photograph each defect present on the bridge, utilizing official 
catalogs and documenting the specific bridge component affected by each defect. Based on the previously 
mentioned data encompassing the geometric configuration of the bridges and inspection outcomes, the algorithm 
computes some indices. These indices are associated with a particular level of defectiveness, aligning with the 
established protocol for inspecting railway structures. Through assigning level of defectiveness, the bridges, 
infrastructure management gains the ability to ascertain priority interventions and make well-informed choices 
concerning measures for mitigating risks. 


The operationalization of the management software functionalities was achieved through a synthesis of Python 
code scripts utilized within the visual programming environment (VPL) Dynamo. These scripts were integrated 
with Excel spreadsheets and an Access database, encompassing various tables and numerous queries, the latter of 
which were scripted directly in SQL language. The Dynamo-developed algorithm is composed by several 
interconnected node clusters in groups, each of them serving distinct functions. These functions include querying 
the BIM model for crucial input, exporting datasets to Excel, enabling automated interactions between Excel and 
Access, reading and importing externally processed data, and ultimately recording achieved results — or macro- 
outputs — within pertinent fields in BIM environment (fig. 10). 
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Fig. 10: Data flow. From macro-inputs to macro-outputs 


The implemented management functionalities within the BIM environment enable railway engineers to record 
bridge defects directly in the BIM environment, execute the Dynamo algorithm, and access the calculated indices 
in the same software. Finally, an IFC model of the metal truss over the Osa River was generated in accordance 
with the IFC4 standard. In this model, bridge components were categorized as follows: 


e all beam-like components have been converted into IfcBeam instances; 
e flat metal profiles have been translated into IfcPlate instances; 
e bolts have been categorized as IfeMechanicalFastener instances; 
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e masonry abutments are classified as IfcBuildingElementProxy instances; 
e supports have been transformed into IfcBuildingElementProxy iinstances. 


With the upcoming release of IFC 5 and its corresponding implementation by commercial software, it will be 
possible to more accurately map the elements of an infrastructure. 


2.4 Case Study 4: Toppoli Bridge over the Arno River 


The presented case study pertains to the Toppoli Bridge located over the Arno River near Bibbiena in the Province 
of Arezzo (Tuscany, Italy) (fig. 11, 12). This bridge, originating from the early 1900s, was constructed using 
traditional building techniques. The structure comprises a substantial masonry construction featuring a dual arched 
span. Each arch possesses dimensions of 19.60 meters in length, 3.00 meters in height, and 5.00 meters in width. 
The piers and abutments are constructed using square stone masonry, while the arches are composed of brick 
masonry. In the approximate timeframe of the 1960s, a noteworthy event involves the expansion of the road deck. 
This expansion entails a reinforced concrete slab measuring around 7.50 meters in width and 0.25 meters in 
thickness. This slab extends out as a cantilever from beneath the masonry arches. 


Fig. 11, 12: Views of Toppoli Bridge over the Arno River 


The bridge was part of a 2019 experiment in multilevel methodology, conducted through a collaboration between 
the Tuscany Region Administration and the Regional Federation of the Orders of Engineers of Tuscany. This 
initiative aimed to analyze and inspect priority bridges efficiently. The experience prompted the authors to develop 
an innovative BIM-centered approach for bridge risk management and monitoring. 


2.4.1 Documentary research and survey campaign 


No archival records of Toppoli Bridge original design have been discovered. The acquisition of geospatial data 
was achieved through a laser scanner survey, facilitating the creation of point clouds compatible with the chosen 
BIM authoring software, Autodesk Revit®. Prior to importing, the point clouds underwent necessary adjustments 
within Autodesk ReCap® software. This included aligning the scans into a unified model and reducing the point 
count to eliminate redundancies and the noise. The point cloud was then stored in .rcp format for seamless 
integration into Autodesk Revit® software. 


2.4.2 BIM Modeling and Implementations 


The preliminary BIM model was established utilizing system families, followed by the development of customized 
loadable families to represent specific components. Throughout this process, the aim was to adhere to the 
classification structure outlined in the CNR classification of masonry bridges. To model individual components, 
the point cloud was segmented, component by component, using the open-source software Cloud Compare. Each 
point cloud segment was then employed for BIM-based modeling of the corresponding component, either directly 
or by extracting sections in .dxf format (fig. 13). 


Utilizing the DB-Link plug-in, model information from the Toppoli Bridge BIModel was exported and integrated 
into Microsoft Access software. This facilitated real-time updates reflecting any alterations made to the model. 
Any changes made in either Revit or Access were synchronized seamlessly between the two. Moreover, 
modifications to the database could be propagated back to the BIM model in Revit. 


A BIM-centric system was established to manage sensor data, with a focus on accelerometers. In the case study, a 
WIT-type accelerometer sensor capable of capturing angle, acceleration, angular velocity, and magnetic field along 
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the 3 XYZ axes was employed. The acceleration measurements exhibit an error rate of approximately 1%, with a 
capacity to record accelerations up to 16g. Extracted data can be exported in .txt format and imported into 
spreadsheets for generating graphs that depict accelerations along the three axes (fig. 14). 


Building on the earlier described implementation process, the default information model in Autodesk Forge® 
software was substituted with the Toppoli Bridge HBrIM model, functioning locally. This model integration 
includes all essential tools for querying and editing the model. Additionally, it offers a comprehensive view of both 
the overall model and individual elements. Reference “sprites” are created within the model, i.e. icons that simulate 
the position of the sensor in the artefact. The setting then of a “sprite” within the model allowed the visualization 
of data obtained from the sensor directly on the screen. Notably, the study's scope primarily concentrated on 
optimizing the workflow process and did not delve into the technical and operational management of sensors 
installed on the bridge. 


Fig. 13, 14: View of BrIM model | Representation of data by sensor device in a data sheet. 


3. RESULTS AND DISCUSSION 
3.1 Results 


The results of the topographic survey campaigns and experiments conducted for each distinct case study are 
presented below. 


1) Masonry Bridge over the Masera Ditch - The laser scanner survey campaign yielded excellent results, partly 
owing to the bridge's composition of easily distinguishable massive elements. RGB data was acquired. The 
workflow in this case was forced given the absence of technical documents. Using the Dynamo platform, a 
dedicated tool was developed to model elements characterized by double curvature, in this specific case the arches 
of the bridge. The tool described above produced the desired results and can be reused in similar situations, which 
are often encountered in the study of the heritage of Italian historic bridges. 


2) Giorgini Bridge over Bruna River - The point cloud generated by the laser scanner comprise a total of 19 million 
points, while the one generated by UAV photogrammetry reached the size of 14 million points. The comparison in 
terms of standard deviation between the two point clouds, conducted using Cloud Compare, yielded optimal results. 


3) Metal Truss Bridge over the Osa River - The survey campaign on the metal truss proved to be challenging due 
to its complexity. Given its numerous components and the sharp angles that define them, acquiring the bridge's 
geometry using laser scanners is hindered by the inevitable presence of “shadowed areas” in the acquired dataset. 
The chosen modeling workflow was partly influenced by the availability of the original technical drawings and 
partly dictated by the complexity of the metal truss described above. 

The implemented management functionalities within the BIM environment enable railway engineers to record 
bridge defects directly in the BIM environment, execute the Dynamo algorithm, and access the calculated indices 
in the same software. In the generated IFC model, bridge components were categorized as follows: 

- all beam-like components have been converted into IfcBeam instances; 

- flat metal profiles have been translated into IfcPlate instances; 

- bolts have been categorized as IffMechanicalFastener instances; 

- masonry abutments are classified as IfcBuildingElementProxy instances; 

- supports have been transformed into IfcBuildingElementProxy instances. 

With the upcoming release of IFC 5 and its corresponding implementation by commercial software, it will be 
possible to more accurately map the elements of an infrastructure. 
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4) Toppoli Bridge over the Arno River - The laser scanner survey campaign produced outstanding results, in part 
due to the bridge's makeup of clearly distinguishable large components, similar to what was previously noted for 
case study 1). A two-way connection between the BIM model in Revit, viewed as a collection of elements, and a 
dedicated database created in Microsoft Access has been developed and tested, yielding satisfactory results. A 
workflow was tested and optimized to connect a sensor, specifically an accelerometer, with the BIM model of the 
Toppoli Bridge. The objective was to visualize the sensor’s output on the Autodesk Forge platform. Historical 
sensor data was used in the test; nevertheless, it is possible to achieve the same result with real-time data using a 
more powerful software version and a connected sensor. 


3.2 Discussion 


The present study aimed to offer a contribution to the development of Structural Health Monitoring systems in the 
context of transport infrastructures and bridges in particular, in line with the recent orientations of the technical- 
scientific community, both at the national and European level, on risk management and assessment. 


The application of Building Information Modelling tools and methodologies was the first step in the information 
management of bridge knowledge. The obtained BIM models for the various studied bridges demonstrate the 
reliability of the scan-to-BIM methodology for modeling both infrastructure and bridges, just as it is for buildings. 
The conducted survey campaigns vary in terms of the instruments used and the surrounding conditions. 
Consequently, these experiences are valuable for identifying common factors that have influenced them: the 
intrinsic nature of the bridge and the materials of which it is made, the ability to suspend bridge usage, the size of 
the river or obstacle crossed by the bridges, the overall accessibility of the structure, and the economic and time 
costs associated with potentially repeating the survey campaign. 


The most significant factor influencing the choice of BIM modeling workflow is the presence or absence of reliable 
design documentation. In the case of historical bridges, a survey campaign is essential. However, for some cases, 
the survey results become the primary or sole dataset for modeling, while in others, they serve only as a comparison. 
The processes of implementing bridge information models from the geospatial data acquisition phases conducted 
with laser scanning or 3D image surveying techniques show an adequate level of maturity for the intended and 
foreseeable uses in bridge control and monitoring activities. In particular, the use of scripts created in Visual 
Programming Language (such as Dynamo) into the BIM authoring software allows for the effective handling of 
complex shapes, which are derived from the geometric-constructive rules used in the design of railroad tracks and 
artwork in the late 1800s. 


At the interoperability level, it has been demonstrated that translating a BIM model of a bridge into the IFC schema 
is already feasible, although currently, there are occasional challenges in classifying certain elements. Geometric 
data, however, is readily translatable into the IFC data schema. On the contrary methods of managing data from 
bridge monitoring can vary widely depending on the criteria set by the owner, which based on its strategic goals 
and internal organizational structure may define different approaches in terms of Asset Management System and 
related supporting technological infrastructure. 


An initial strategy for enabling a “smart” approach to bridge risk assessment has been developed in the fourth 
study case, involving the implementation of a continuous data acquisition process through sensors installed on the 
physical structure. This approach leveraged the cloud-based Autodesk Forge platform, seamlessly integrated with 
BIM authoring software for information models. The functionalities of the corporate software assisting engineers 
in evaluating the safety and maintenance condition of railway bridges have been replicated within the BIM 
environment in the third case study. It was possible to provide information necessary to support decision-making 
regarding prevention and mitigation of natural and anthropogenic hazards, that pose a threat to the stability of the 
examined bridge and the integrity of the infrastructure network. 


4. CONCLUSION 


The conducted studies demonstrate the benefits achievable through the approach ‘BIM - first of all’, which 
prioritizes the use of BIM models at the core of bridge management processes for these infrastructures. 


The approach used in the proposed experiments, can be attributed to the economic concept of a “Point Solution”, 
whereas the more desirable approach for Bridge Management Systems (BMS) is certainly a “System Solution”. 
This implies a comprehensive rethinking of BMS within the context of BIM and AI integration, aiming for a 
comprehensive and holistic solution. 
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The topics covered occupy a prominent place in the National Research Plan 2021-2027, particularly in the areas 
related to security and digital innovation. This is in line with Cluster 3 of Horizon Europe 2021-2027 and supports 
goals 9 and 11 of the United Nations Agenda 2030, which focus on resilient infrastructure, innovation and 
sustainable urban development. 
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ABSTRACT: At construction sites, as-built management is generally conducted by taking pictures or surveying 
with total stations and comparing the images or survey data with design drawings or Building Information 
Modeling (BIM) models. Since this work is time-consuming and error-prone, more efficient and accurate methods 
using advanced Information and Communication Technology (ICT) are desired. Therefore, this research proposes 
a method that can efficiently capture the progress of construction by detecting each constructed structural member, 
such as beams, columns, connections, etc. In this proposed method, construction engineers first take many pictures 
of the construction site and conduct automatic image segmentation using a pre-trained Convolutional Neural 
Network (CNN) model. Next, point cloud data is generated from taken pictures by using Structure from Motion 
(S{M). Then, the point cloud data is semantically segmented by overlapping the segmented images and point cloud 
data using the pin-hole camera technique. Finally, the design BIM model and segmented point cloud data are 
overlapped, and constructed parts of the BIM model can be detected, which can be reported as as-built parts. A 
prototype system was developed and applied to an actual railway construction project in Osaka, Japan for testing 
the accuracy and performance of the system. 


KEYWORDS: Construction progress management, Instance segmentation, Point cloud, Building Information 
Modeling. 


1. INTRODUCTION 


Construction site management involves inspecting the completed parts of a construction project to ensure that the 
work is within specifications and contractual requirements. This task requires construction workers to compare the 
actual construction with the provided drawings and documents. The goal is to ensure that the construction is 
performed correctly and to calculate the corresponding contract price. Traditionally, construction management 
relied on drawings, but the use of 3D models has become more prevalent. These models enable better visualization 
and consensus building among stakeholders. While image data and laser scanners have been used in previous 
studies to create 3D models, large-scale structures and deep learning techniques have not been fully utilized for 
construction site monitoring. The succession of technical skills in the construction industry has been identified as 
an issue, prompting the need for changes in the construction production system. Leveraging technology 
advancements, such as 3D models and sensor information, has improved efficiency and contributed to various 
aspects of construction, including design, management, and maintenance. Building Information Modeling (BIM) 
is a lifecycle management system that facilitates efficient building maintenance. However, the process of collating 
3D models with 2D drawings is time-consuming and prone to human error. Structure from motion (SfM) is a 
method used to acquire 3D data of existing structures, but converting point cloud models to polygon models 
presents challenges such as removing unnecessary details and setting appropriate thresholds. Efforts are needed to 
develop more efficient methods for capturing the current 3D model of a structure. 


Recent advancements in deep learning and object detection technology have automated tasks such as construction 
site inspections, including identifying deformations and damages from images. The availability of large image 
datasets, such as ImageNet, has greatly improved object recognition accuracy using deep learning algorithms. In 
addition, recently, much research has been done for classifying point cloud data using deep learning (Charles et al. 
2017). However, much research is required to classify civil infrastructure members. 


Thus, this research has adopted a more simple 2D object detection method using deep learning and a pin-hole 
camera method and combined it with 3D BIM models to reproduce the construction situation on a 3D model and 
calculate construction costs. A training dataset specific to construction members was created to fine-tune existing 
deep-learning models. The proposed method enables efficient shape detection and attribute identification of 
construction elements and should contribute to the integration of detection information into 3D models, facilitating 
the creation of as-built models. 
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2. RELATED WORKS 


Before the advent of deep learning-based object detection, selecting features for object detection was challenging, 
especially in complex construction sites with various members and intricate structures. Past research in the field 
of construction has focused on automating tasks such as progress and productivity management using deep 
learning. One study proposed a system to automatically recognize completed parts in construction site images, but 
the detection results were not applicable to other systems, and accuracy was limited for complex-shaped structural 
members (Fathi et al., 2015). Research combining automation technology and BIM has aimed for efficient work 
(Kropp et al., 2018; Park et al., 2018), but perfect automation remains elusive due to the need for human 
intervention. 


Another study developed a management system using a 3D model and proposed a method for constructing original 
models by detecting structural members in existing bridges from point cloud data (Lu et al., 2019). However, the 
method faced challenges in detecting complex geometric structures such as concrete or truss bridges. A laser 
scanner is used to create detailed BIM models of existing facilities but encountered difficulties with complex 
structures and occlusion (Tang et al., 2010). Various approaches have been attempted to create 3D models of 
existing buildings (Bosche et al., 2009; Brilakis et al., 2010). 


Recent advancements in computer vision technology have enabled the automation of tasks performed by the 
human visual system. One study developed a system that automatically detects construction members in a room 
using 2D image data (Hamledari et al., 2017). Other studies have attempted to capture construction status and 
shape from image data (Gidaris et al., 2015; Khaloo et al., 2015). Perez-Perez et al. (2021) developed a method 
for the segmentation of indoor point clouds via joint semantic and geometric features for 3D modeling of the built 
environment. Pan et al. (2022) proposed geometric digital twins of buildings with small objects by fusing laser 
scanning and Al-based image recognition. However, the detection of different material members and multiple 
structure types for outdoor civil infrastructures remains challenging. Therefore, this research aims to fill this gap 
to improve the performance of as-built detection of civil infrastructures under construction for better construction 
site management. 


3. PROPOSED METHOD 


The proposed method aims to recreate the construction status in an as-built 3D model by incorporating the shape 
information from the detection result images obtained through a deep learning model. This allows for cost 
calculations without the need to match 2D drawings with the construction progress. The positional relationship 
between the detection result images, the completed 3D model, and a point cloud model generated using Structure 
from Motion (SfM) are matched. 
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Fig. 1: Method overview. 
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As shown in Fig. 1, the method consists of three main steps: (1) performing segmentation detection using a fine- 
tuned deep learning model to identify structural members in the construction images, (2) creating a point cloud 
model using Agisoft Metashape to determine the 3D positions of the images, and (3) integrating the completed 3D 
model, detection result images, and point cloud model in a volume detection system using the Unity game engine. 
The positional relationship is established, and the identified structural members are recorded in an Excel sheet. 
Finally, the construction status is reproduced in BIM software (Revit), and the attribute information of the 
structural members is used to calculate the construction cost at the time of shooting. 


3.1 SEGMENTATION 


In this study, the weights of a U-Net model trained on the Cityscapes Dataset (Cityscapes Dataset, n.d.) were 
adjusted to distinguish structural members and the background. By updating the weights of the 37 layers, the 
positions and attributes of the structural members in the captured images could be identified. These detection 
results were treated as the finished form, providing insights into the construction site's situation. Since there were 
no published trained models for construction members such as columns, beams, and ducts, a training dataset was 
created using interior photographs of buildings under construction. The dataset was manually annotated using 
Adobe Photoshop CS4, creating mask images for each target member. The existing trained model was then fine- 
tuned using the mask images and the corresponding color changes in the original images. 


In this study, a U-Net model trained on the Cityscapes Dataset was fine-tuned and used as a CNN for object 
detection to detect construction structural members from images, specifically targeting the five main structural 
members (Fig. 2). 


um Column me Girder 
SS joint MM Cross beam 
=m Floor slab 
512 x512 512x512 


128 x 128 


Input image 


Output image 
Deep Learning process 
Fig. 2: U-Net-based CNN structure and learning results (example). 


Meanwhile, a three-dimensional model representing the real space, including the camera position and target 
structure, was created using Structure from Motion (SfM) and Agisoft Metashape, a software for photogrammetric 
processing and 3D spatial data generation. Fig. 3 shows the results of fine-tuning U-Net using the created training 
dataset, where IoU stands for Intersection over Union. 


(a) Average accuracy of training model (b) Average loss of training model (c) Average IoU of training model 


Fig. 3: The results of fine-tuning U-Net using the created training dataset. 
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The deep learning model in this study does not exhibit over-learning. The average accuracy for the training dataset 
increases, but the average loss for the test training dataset does not decrease. The detection accuracy of the fine- 
tuned model is evaluated using the average IoU value, which is 0.6428 at the 300th epoch. The IoU value measures 
the overlap between the correct answer area and the predicted area, indicating the model's performance. The 
evaluation index IoU indicates the numerical value obtained by dividing the overlapping part of the correct answer 
area and the prediction area by the union part of both areas, as explained in equation (1). 


IoU (Intersection over Union) = (Intersection of detection areas)/(Union of detection areas) (1) 
3.2 POINT CLOUD MODEL CONSTRUCTION 


It is difficult to reproduce the positional relationship of the construction status when each image and model is 
imported into the game engine Unity in the lack of coordinate information. Therefore, we use SfM and Agisoft 
Metashape software to create a three-dimensional model that replicates the camera position and target structure in 
virtual space. Metashape allows us to process digital images and generate 3D spatial data, enabling the scanning 
of both small objects and large buildings. By analyzing the overlapping shooting locations in the photographs, we 
can calculate the distance to the subject in each photo. 


MSS 


Fig. 4: Transfer from real frame to SfM model. 


The gray model consists of a mesh overlaying a point cloud model created in Metashape, while the white object 
represents a virtual camera. As shown in Fig. 4, we can confirm the accurate reproduction of the camera's position 
and the target structure in Unity. Valid values for camera parameters such as position and rotation were confirmed 
in Unity, indicating successful reproduction of the real-world positions of the target structure and the camera in 
the game space. By overlaying the SfM model with the expected BIM model, a work detection system was created. 
The deep learning model performs object detection using the created mask image, adjusting the position and 
rotation coordinates of the virtual camera based on the mask image's parameters. The field of view on the virtual 
camera side is also adjusted to match the mask image. These preparations enable the replication of the construction 
situation in the game engine and the reflection of the deep learning model's detection results onto the BIM model. 


4. EVALUATION 


A case study was conducted on a building under construction at Osaka University to verify the proposed method. 
The system was tested by creating a BIM model from construction drawings and capturing photographs at the 
construction site, allowing for the verification of volume detection using a point cloud model. The system was 
implemented and tested using Unity on a standard PC with an Intel Core i17-3770K CPU and 32 GB RAM. 


The detection result image from the deep learning model is utilized as a filter to extract and choose the completed 
portion from the generated BIM model. By aligning the aspect ratio of the Unity camera with the actual image 
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size, the system excludes the undetected background area that is still under construction, ensuring only completed 
members are selected such as shown in Fig. 5. 


In order to apply the image filter to the application, multiple angled frames are captured and went through a deep 
learning model with different weights. Fig. 6 shows part of the detection results 


Fig. 7 shows the generated model in Unity and Revit. The detection results on the left if it indicates the accurate 
detection of completed ducts. However, upon examining the member model based on element ID, it was found 
that four beams on the front side of the third-floor slab were missing. Additionally, low detection accuracy is 
observed for elements not included in the learning dataset, such as overhead poles, scaffolds, and multiple electric 
wires, depending on the viewing angle of the target structure. On its right, the screen displays the selection of the 
element ID obtained from the viewpoint, with the selected member highlighted by the blue wireframe line and the 
unselected part shown by the black wireframe line. 


Table 1 displays the results of member detection from multiple viewpoints in the case study, including detection 
accuracy for each member and overall detection accuracy calculations. 


Camera position 
when taking photo 


mask image as a filter 
(Rays pass only in the 
white part) 


Predicted BIM model 
(Finished shape) 


Fig. 5: Image filter in BIM model construction. 


Fig. 6: Part of the results in image processing. 
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Fig. 7: Generated 3D model in Unity and Revit. 


Table 1: Structure member detection accuracy. 


Shooting Overall detection Column detection Floor slab detection Foundation detection Beam detection 
viewpoint accuracy (%) accuracy (%) accuracy (%) accuracy (%) accuracy (%) 
Viewpoint 1 92.31 100 100 100 80 
Viewpoint 2 98.08 95 100 100 100 
Viewpoint 3 83.33 90 100 100 75 
Viewpoint 4 86.79 90 100 100 80 


5. CONCLUSION 


This study aimed to verify the effectiveness of using deep learning for detecting structural members and 
construction equipment at a construction site. To overcome the limitation of existing training datasets, a 
verification experiment was conducted using images created from photographs of other construction sites. The 
existing convolutional neural network (CNN) was fine-tuned with different learning weights to detect structural 
members from actual construction site photos. The following results were obtained: 


¢ The shape of the target structure could be detected from the construction site photographs by considering the 
detection result image. 


e — Avolume calculation system was constructed using the deep learning model's segmentation results, enabling 
volume calculation on a three-dimensional model based on the shape information from two-dimensional 
images. 


e The 3D model that reproduces the construction site was displayed on BIM software like Revit by acquiring 
the element ID from the 3D model using the Unity game engine. 
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: The recall of the constructed 3D model at each viewpoint showed an average of 90%, demonstrating high 
accuracy by combining detection results from multiple viewpoints. 


° By assigning attribute information of construction unit prices to each member, it was possible to calculate 
the work volume based on the work form. 


To improve the accuracy of the volume detection system, the deep learning model's detection accuracy needs 
enhancement. The training dataset should include information on detecting obstacles in front of the target. The 
applicability of the proposed method to other structures and the diversification of the system needs further 
investigation. 
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FIRE SAFETY ENGINEERING: THE COMPUTATIONAL SIMULATION 
OF THE ESCAPE IN A HISTORIC BUILDING IN BOLOGNA 


Ing. Stefano Tagliatti & Prof. Ing. Marco Alvise Bragadin 
University of Bologna, Italy 


ABSTRACT: In the field of Fire Safety Engineering (FSE), virtual reality has increasingly assumed an important 
role, especially for the simulation of fire and escape. 


The present work aims at comparing the potential of virtual simulations of the escape of occupants in case of 
emergency to simulations based on traditional calculations. Above all, the goal is to highlight the greater 
adherence to reality of the simulations that use behavioural models compared to those that use hydraulic models. 


Simulations are performed for the case study of a listed historic tower in Bologna city centre and calculate the 
Required Safe Escape Time (RSET) in various evacuation scenarios using the innovative Pathfinder® software 
which, in addition to using flow-based models, is "agent-based", as it manages the variables related to behavioural 
factors and can model complex escape scenarios faster than hand-made calculation. 


Case study results show that RSET times calculated with the behavioural steering mode in the virtual environment 
are 15-19% higher than the hydraulic mode (SFPE) and therefore demonstrate that the Steering mode is more 
realistic, as human behaviour significantly influences the evacuation process. 


Anyway, all the realistic simulations return safety margin times above 100% of the RSET as asked by national law, 
highlighting that it is possible to guarantee the safety of the occupants in a particular historical building using 
innovative Fire Safety Engineering (FSE) approaches, even if the prescriptive rules are not respected. 


KEYWORDS: FSE, Virtual reality, Emergency Escape, ASET/RSET. 


1. INTRODUCTION 


Optimizing the fire prevention of a human activity means to identify technical solutions aimed at achieving three 
primary objectives: the protection of human life, the protection of assets and the protection of the environment. 
Therefore, in Fire Safety Engineering the chosen escape strategy, i.e., the one that ensures that the occupants can 
reach a safe location, independently or with assistance, before the fire causes incapacitating conditions, represents 
one of the most important and most complex designing strategies. 


The international legislation as well as the Italian one is increasingly moving from a prescriptive approach (with 
defined rules to be strictly applied), towards a performance approach with Fire Safety Engineering (FSE) which 
better allows to deal with the most complex situations. 


This new and innovative FSE approach is certainly more suitable for a specific assessment of the individual case 
under study, and it allows greater flexibility and gives greater autonomy and responsibility to the designer on the 
basis of rigorous scientific modelling. 


Scientific modelling brings a new responsibility to be taken on by the designer and this requires a greater 
knowledge of FSE processes and having new modelling skills. Virtual reality becomes very important as the 
scientific-predictive simulation software of the movement of people during the escape manages to model complex 
escape scenarios and gives data output quickly and efficiently, therefore a better understanding of the phenomenon 
is achieved even if environmental and specific conditions vary. 


The research work under this paper deals with the problem of escape in case of fire in listed historic buildings and 
in particular compares the results returned by an innovative simulation software with those obtained with other 
traditional calculation methods. 


The case study concerns the museum part of a historic tower in Bologna, the eighteenth-century Astronomical 
“Specola” Tower, in Bologna, built inside Poggi Palace, which is a historic building listed since 1911. Simulations 
of the escape were carried out assuming different emergency evacuation scenarios using the Pathfinder® software. 
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In the case of historic buildings, the second objective of fire prevention, i.e., the protection of assets, assumes 
particular importance and it can be said that among the most complex situations that a designer has to face there 
are certainly those of protected historic buildings. In these buildings it is very often difficult, or maybe even 
impossible, to comply with the safety requirements established by the fire regulations due to their particularity or 
uniqueness and to the constraints to which they are subjected, constraints that are of an artistic, historical, cultural 
nature and refer both to the building itself and to the context in which it is built. In many cases, the needed structural 
and MEP renovations could be unsustainable, both in terms of impact and in terms of cost. In fact, many time 
happens that the minimum dimensions prescribed by the regulations for escape routes are not respected (widths 
often less than 80 cm, heights less than 1.80 m), or the number of exits is inadequate, or the escape paths 
(unidirectional and otherwise) are too long, or it is impossible to build external stairs, and so on. 


With these situations, for which it is impossible to apply the measures provided for by the national technical rules, 
it is allowed to turn to the Fire Safety Engineering method (FSE) as considered by the New Italian consolidated 
Fire Prevention Code (Decreto del Ministero dell’Interno - DM - 2015, 3 August, updated by DM 2019, 18 
October) and as presented by the ISO international standards (ISO/TR 13387:1999). Therefore, the designer can 
use the FSE-based performance approach, which is aimed at the purpose of safety rather than at the rules to achieve 
it and can apply alternative solutions well-adapted to the specific case that create an equivalent safety level of 
performance. 


Given to the complexity of the application rules of FSE, the support of an escape simulation software such as 
Pathfinder® is needed. The software application allows the translation of simulation constraints into quantitative 
values, which can be inserted within a mathematical model of evacuation. In this way, not only the physical and 
geometric aspects of the structure, but also the qualitative, behavioural and physical aspects of the occupants 
(children, elderly, disabled, with their own speed and specific decision-making processes, etc.) can be modelled. 


2. METHODS 


The case study concerns the Specola Tower Museum (Fig.1, 2, 3) which develops entirely inside the tower, from 
the fourth to the eighth floor (47 m altitude), while the first three floors are included in the structure of Poggi 
Palace (Fig. 1 right). The work involved numerous inspections in collaboration with the museum managers to carry 
out all the necessary surveys including measurements and identification of all the components of fire safety system. 
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Fig. 1: Picture by Monti P. (1974) of the Specola tower (left) and an 18" century reproduction in section (right) 
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Fig. 2: Staircases: from the ground floor (left); access to the museum area from the 3rd to the 4th floor (right) 


Fig. 3: Spiral staircase from the 4th to the 7th floor of the tower 


2.1 The fire safety compliance solution in application of the rules 


It should be noted that the New Italian consolidated Fire Prevention Code provides for rules valid for all human- 
based activities (the so-called “Regola Tecnica Orizzontale” RTO or "Horizontal Technical Rules") and specific 
rules for historic buildings (the “Regola Tecnica Verticale” or "Vertical Technical Rules" RTV10 and RTV12). 


The analysis of the case study was based on the life risk profile, Rvita (given by the characteristics of the occupants 
and the rate of growth of the fire), as well as on the maximum crowding of each room of the structure. The designers 
found that the rigorous application of the compliance solution for evacuation was not possible, as the prescriptive 
rules of the Code (RTO) and of the RTV10 specific for museum activities in buildings subject to protection, were 
not all fulfilled. 


e the heights and lengths of escape routes and dead-end corridors comply with the requirements. 


e each floor is a specific fire compartment (there are two activities that are not pertinent to each other: museum 
and offices/services, which are not in separate compartments). 


e the stairway area, with annexed landings and corridors, essential for emergency evacuation, is correctly 
compartmentalized and would constitute a temporary safe area; however, since it is a multi-storey building of 
considerable height (47 m), the standard requirement is that the current single compartment of the stairwell 
delimited by REI fire doors is divided into at least three compartments in order to be able to be considered as a 
temporary safe location (maximum 18 m for the RTV10). 


e the width of some horizontal escape routes (Fig.4 on the left) is not compliant with the law (72-73-75 cm, while 
the limit is 80 cm for the RTV10) and also the width of the spiral staircases (Fig.4 on the right) from the fourth to 
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the eighth floor is not compliant with the law (71-75 cm, while the limit is 80 cm for the RTV10). 


e there is only one way out, which, although admitted, is problematic due to the physical characteristics of a part 
of it, being a spiral staircase, 5 floors long and rather narrow and steep. 


minimum minimum y minimum minimum 


compartment Ryrra ae crowding width width real width compartment Rya a crowding width width real width 
per p L (RTV10) perp L  (RTV10) 
(mm/pers) (persons) (mm) (mm) (mm) (mm/pers) (persons) (mm) (mm) (mm) 


4° floor B2IBI/A2 4.1 36 147.6 800 730 sare 2.85 126 359.1 800 820 


(from 4" to ground floor) 


stairwe 
(from 5" to 4" floor) 
stairwe 
(from 6" to 5" floor) 
stairwe 


(from 7" to 6" floor) 
stairwe 


(from 8" to 7" floor) 


5" floor B2/BI/A2 4.1 22 90.2 800 820 B1 3.1 90 279 800 750 


6"floor BI/A2 3.8 18 68.4 800 1070 B1 3.4 68 231.2 800 750 


7" floor  B2/B1 4.1 27 110.7 800 850 Bl 3.8 50 190 800 750 


8" floor Bl 3.6 23 82.8 800 720 Bl 4.25 23 97.75 800 710 


Fig. 4: Comparison between horizontal (left) / vertical (right) exit widths required by rules and those of the case 
study. 


2.2 The alternative FSE-based solution in application of Section M (Fire Prevention 
Code) 


As the compliance solution is not fully applicable, the designers have switched to the performance approach as 
proposed by FSE and an alternative solution, provided for by section M of the Code, was proposed to calculate the 
escape time. 


By using the FSE-based performance approach, the designers can propose alternative design solutions that provide 
an equivalent fire safety level of performance and are sustainable both in terms of architectural and environmental 
impact and in economic terms. 


The alternative solution involves comparing the Available Safe Escape Time (ASET), i.e. the time available for 
the escape guaranteed by the building, and the Required Safe Escape Time (RSET), i.e. the time actually taken by 
the occupants for the escape, from the moment the fire is triggered to the moment they reach a safe location to 
save themselves. The established engineering criterion is that ASET > RSET and that the difference between the 
two, i.e. the safety margin (tmarg), is greater than or equal to 10% of the RSET, and in any case not less than 30 
seconds, in the event that the ASET derives from a reliable calculation (obtained by using fire simulation models 
such as FDS, for example, which is one of the most used software), or is greater than or equal to 100% of the 
RSET, otherwise. In the case study presented, the reference is the latter, because the ASET was not simulated, but 
data from examples relating to listed buildings taken from the literature were used. 


To calculate the escape time, it is necessary to obtain the RSET time. 


The international standard ISO/TR 16738 of 2009 (implemented in the Fire Prevention Code - Fig. 5) defines the 
RSET as the sum of 4 components: 


RSET = taet + ta + tore + ftra 
where 
tat is the detection time 
ta is the alarm time 
tpe is the pre-travel activity time, PTAT 
tra is the travel time 


Of these 4 components, only the calculation of the movement time, Time of travel (Tira), was carried out because 
the other 3 components of the RSET time are calculated with the European standard (ISO/TR 16738:2009) and 
were assumed equal respectively to: 60 seconds the Taet (which is considered cautionary since in the museum there 
is the presence of an automatic fire detection and alarm system-IRAI), 0 seconds the T, (as there is an automatic 
IRAI with optical-acoustic panels) and 30 seconds the Tyre, which is a low value, because it is due to the presence 
of awake occupants, without motor disabilities and trained guides who always accompany visitors to the Museum 
and who help them with wayfinding. 
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Fig.5 - Comparison between ASET and RSET (Illustration M.3-1 of the Code DM 2019, October 18, modified) 


For the calculation of the movement time (ttra) two alternative methods can be used: 


1. flow-based models (hydraulic): they are macroscopic models that represent people as a homogeneous 
group and in which the movement of the crowd is considered similar to the flow of a fluid. 

2. ABM agent-based models (behavioural): they are microscopic models that take into account human 
behaviour, as well as movement, and in which the occupants are considered individually. 


2.3 Escape simulation software and Pathfinder® 


Simulation software are essential for the development of behavioural models because they are able to manage a 
multiplicity of factors: technical, physical and geometric considerations are accompanied by many components 
connected to the human behaviour of the individual occupants. 


These software have been categorized from each other by NIST - National Institute of Standards and Technology 
(E. Kuligowski, et al. 2010) based on the different characteristics: modelling method, model purpose, type of 
structure, model view of the occupants, behaviour of the occupants, type of movement of the occupants, ability to 
enter fire data, ability to import CAD data, visualization and validation methods. 


The software chosen for the case study is Pathfinder® by Thunderhead Engineering, an agent-based simulator. In 
addition to being based on the hydraulic models that describe the movement, Pathfinder® manages the behavioural 
variables, describing complex behaviours and reciprocal interactions. Fach occupant is defined by a set of 
parameters which determines its behaviour during the evacuation phase and the interactions with the other 
occupants. Pathfinder® is designed to simplify the input phase of the many information managed, but what is more 
important is that it has a powerful 3D graphical interface that shows the filmed sequence of the virtual evacuation 
in a very realistic way. 


Pathfinder® uses a three-dimensional geometry model, however, for simulation purposes, the elements considered 
are only of the two-dimensional type, to reduce the calculation complexity. The Pathfinder® movement 
environment (3D continuous space model) is automatically transformed into a 2D navigation triangular mesh 
(represented by adjacent triangles) on which the occupants are free to move. The use of triangular meshes for the 
geometric representation allows the software to discretize even curved surfaces quite effectively. Obstacles (up to 
1.8 meters away from the floor) are represented in the navigation mesh as empty spaces, which prevent the 
occupants from being able to move in spaces that house walls, furnishings, objects and therefore in fact they can 
only move on the navigation mesh. The navigation geometry is organized into irregularly shaped rooms, with 
boundaries that cannot be crossed by the occupants. The passage from one room to the adjacent one can be done 
through connecting doors. A door that does not connect two rooms and is placed on the outside boundary of one 
room is defined as an exit door. See Figure 6 as an example of a navigation mesh, where occupants are indicated 
with blue dots, doors with orange lines, and the exit door with a green line. Any mesh zone can be classified as 
one of four types: open space (room or ramp), staircase, connecting door and exit door, each with a different effect 
on the occupants’ behaviour. 


In the user interface, each person can be assigned their own profile and behaviour. The profile defines the fixed 
characteristics of the occupants (i.e., maximum speed, size, colour). The behaviour defines a series of actions that 
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the occupant performs in the simulation (for example moving to a room, waiting time, exiting, route stops). Based 
on individual characteristics, each occupant makes autonomous decisions on which path to take. It is possible to 
define groups of occupants with the same behavioural properties who look for each other and who maintain a 
minimum distance between them (such as families, colleagues, students), associating them with a leader profile 
(for example a tour guide in a museum). 


Fig. 6: Example of 3D geometry and related 2D navigation mesh with evidence of rooms, doors and exit doors 
(Pathfinder Technical Reference Manual, 2022) 


Pathfinder® provides two modes of occupant movement simulation: 


e SFPE (hydraulic model) which reproduces the concepts and calculations defined in the “SFPE Handbook 
of Fire Protection Engineering, 2016” and in the “SFPE Engineering Guide: Human Behavior in Fire, 2019” 
and considers the movement of the occupants as a flow model where walking speeds are determined by the 
density of occupants within each room and flow through doors is dictated by their width. In SFPE mode the 
occupants do not attempt to avoid each other but may overlap. The main parameters used in SFPE mode are 
the following: maximum occupant density for the room, effective width of the door (Boundary Layer), 
specific flow through the doors, movement speed of an occupant. 


Steering (behavioural model) which reproduces human behaviour and movement as much as possible and 
is based on the studies carried out for the first time by C. Reynolds (“Steering behaviours for autonomous 
characters - 1999”): through a combination of guidance and collision management mechanisms (with people, 
walls or objects), it allows each occupant to proceed towards his goal while avoiding other occupants and 
obstacles along the way, proceeding in lanes in the case of counter-current occupants, following other faster 
occupants, etc. The movements of each occupant in the different possible directions are evaluated and the 
optimized direction is then determined. The main parameters used in Steering mode are the following: 
maximum speed of each occupant, maximum acceleration and occupant density. 


2.4 The calculation of the Ttra with Pathfinder® in the case study 


Firstly, hand calculation for escape time computation was carried out with a hydraulic model and then Pathfinder® 
was used to develop the simulations in the two methods provided: SFPE and Steering. 


The simulations were carried out to determine how the evacuation time varies according to the different 
hypothesized scenarios, which differ from each other in terms of the number of occupants, their type and their 
location. 


The working procedure was as follows: 

e Setting the geometric characteristics of the building by importing the 3D DWG files into the software (Fig.7 
on the left) 

creation of 5 occupant profiles (guides, adults, elderly, children and staff) with specific characteristics (travel 

speed, shape, size, reduction factors, etc.). In Fig.7 on the right the input interface of the "child visitors" 

assignment of a behaviour (choice of exit, priority, initial delay, assistance to others, waiting for assistance, 
etc.) for each profile 

choice of /0 scenarios (see list in Fig.9): 

— 5 "realistic" scenarios with 28 occupants (different in type and location), of which 20 visitors admitted at 
the same time, 2 guides and 6 office workers (in Fig. 8 on the left, the simulation for scenario II in which 
the mixed visitors - children in yellow, adults in red, elderly people in green and guides in black - they are 
partly on the 8" floor and partly on the 7" and the office workers, in blue, are all on the 4" floor) 
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— | "theoretical" scenario with 126 occupants, maximum occupant density admitted by Code (in Fig.8 on the 
right) 
— 4 intermediate “theoretical” scenarios (with 90, 74, 56 and 29 occupants) 
e Calculation of Ta for each simulation with analysis of the speed of the occupants and any critical situation. 


10 simulations in Steering mode and 2 in SFPE mode were performed: Fig.9 shows the results obtained. 
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Geometry bounds minimum: 505, 763601, -240,222600, -20, 138355 Laes 
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[Fatter so geometry ies in one pine innri 6 


[C Adi a blank rectangle to obscure lower floors (]aeduce damet ty move trough raros geometry 


Fig. 7: Pathfinder® user interfaces for importing the 3D DWG file (on the left) and for the characterization of 
the "Child Visitors" occupants (on the right) 


Fig. 8: View of the occupants in simulation II (mixed typology) and in VI (126 people) 


The main outputs of the software consist of realistic 3D graphic elaborations and 2D graphics that allow to 
reproduce the evacuation and to analyze, even with video still images, any critical situation (queues, high density 
areas) and the speed of the occupants (see by way of example Fig. 11 and Fig. 12). 


Particularly effective in terms of clarity for understanding the phenomenon are the video that reproduce the entire 
sequence of the evacuation with all its critical points in a very realistic way and with effective times. 
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CONVE 
CONV! 


scenario total adult elderly children office pre-travel evacuation 


(Pathfinder crowding guides visitors visitors visitors workers time (PTAT) travel time time RSET 
"Steering") (persons) (persons) (persons) (persons) (persons) (persons) (sec) (sec) (sec) (sec) 
low crowding mixed typology visitors all 
I th 28 2 10 5 5 6 30 272.5 302.5 362.5 
on the 8" floor 
me po odor SAR peye 28 2 10 5 5 6 30 264.5 204.5 354.5 
both on the 8" and 7" floors ä = į ë A 
imis oeng mired Bhs Vor 28 2 10 5 5 6 30 241.3 271.3 3313 
adults on the 8" floor, others on the 7" i ` i 
low crowding adults visitors only 
TV th 28 2 20 0 0 6 30 219 249 309 
all on the 8" floor 
v low crowding children visitors only 28 2 0 0 20 6 30 288.3 318.3 378.3 
(or elderly only) all on the 8" floor i i R 
high crowdi ixed 1 isitors 
Vi ee tee ae 126 11 55 27 27 6 30 366 396 456 
in all floors (compliant solution) 
mixed typology visitors as scenario VI 
VII th 90 8 38 19 19 6 30 355.5 385.5 445.5 
except 4" floor 
mixed typology visitors as scenario VI 
VII th th 74 6 31 16 15 6 30 315.8 345.8 405.8 
except 4" and 5" floors 
Ix mixed typology visitors as scenario VI 56 4 23 12 11 6 30 302.5 332.5 392.5 
except 4", 5" and 6" floors r n i 
mixed typology visitors as scenario VI 
x 29 2 11 3 5 6 30 275 305 365 


only 7" and 8" floor 


hydraulic mixed typology visitors 


hand as scenario VI 28 2. 10 > 5 6 30 184.3 214.3 274.3 
method only 7 and 8" floor 


Pathfinder mixed typology visitors 
"SFPE" as scenario VI 28 2 20 0 0 6 30 181.3 211.3 271.3 
(hydraulic) adults only, 8" floor only 


Pathfinder mixed typology visitors 
"SFPE" as scenario VI 126 11 55 27 27 6 30 292 322 382 
(hydraulic) mixed typology on all floors 


Fig. 9: Escape simulation times processed with Pathfinder® in the case study 


3. RESULTS AND DISCUSSION 
3.1 Effectiveness of the alternative solution in the case study 


The calculations and simulations carried out demonstrate how it is possible to guarantee the safety of the occupants 
in a particular historical building using alternative measures, even if the prescriptive standards are not respected. 


Therefore, it was possible to evaluate the efficiency of the design system and to show that the fire safety measures 
adopted in the case study are sufficient to guarantee an adequate level of protection of life and assets. All "realistic" 
simulations, in fact, return safety margins higher than 100% of the RSET time. 


It must be said that concerning the case study, an historic tower museum, some important management fire 
protection measures have been adopted by the museum organization, that help to increase the safety margin, 
reducing the RSET: 


e the non-simultaneity of the two activities carried out inside the building (museum and offices/services). 

e the limitation of the maximum crowding of museum visitors, divided in no more than two groups (one for 
each of the two museum guides). 

e the ineligibility of visitors with motor disabilities. 

e the access of visitors only accompanied by properly trained guides, who lead people who otherwise would 
not be familiar with the place. 


In addition to these safety measures, an important role is played by the preventive and protective measures adopted 
by law, i.e. the presence of an automatic fire detection and alarm system (IRAI); the entire active protection system 
for extinguishing the fire (fire extinguishers and internal fire hose reels) built in all the museum spaces; a clear 
signal that facilitates the wayfinding process; the compartmentation of the escape routes (staircases and corridors 
on the ground floor) open to the public, with the insertion of suitable REI fire doors; the limitation of the fire loads 
in these paths (because of this they can be considered as temporary safe locations). In consideration of the above 
fire protective measures, it can be said that the overall safety is excellent, as the horizontal paths in the various 
floors are very short. 


However, as already mentioned, the ASET time taken as a reference would require a more precise calculation using 
fire simulation models, i.e., deterministic models based on the principles of physics and chemistry. For this reason, 
in this study, a safety margin equal to 100% of the RSET was always considered, as required by the international 
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and local regulations. Furthermore, again to play on the safe side, not all the management fire protection measures 
mentioned above and actually adopted, were considered as present in the simulations. 


3.2 Utility of the escape simulation software 


The simulations (Fig.9) show that the movement time obtained with the SFPE mode (181.3 sec) is more or less 
equivalent with the one obtained with the hand calculation with the hydraulic model (184.3 sec). Therefore, using 
the software in SFPE mode, which uses the hydraulic model, there is no significant qualitative improvement of the 
results compared to the hand calculation, but only a greater computing speed. 


The simulations carried out with the Steering (behavioural) mode, on the other hand, show a significant difference 
compared to the simulations in SFPE mode (hydraulic model). It emerges, in fact, that the calculated times are 15- 
19% higher than those of the SFPE mode. This occurs because the SFPE mode admits the physical overlapping of 
the occupants in the queues that form in the exits and does not properly consider the interactions between the 
occupants themselves, which makes this mode clearly less adherent to reality (see Fig.10). 


Fig. 10: Occupant overlay at stair entrance using Pathfinder® SFPE mode 


Furthermore, with the behavioural steering mode, some critical situations clearly emerge which strongly affect 
escape times. Queues are created in emergency escape and gatherings near the access sections from the floors to 
the stairs, i.e., at the intersection (so-called "converging nodes") between the evacuation flows of the occupants 
who come from the floors and those who come from the spiral staircases. This demonstrates how the Steering 
mode is much more realistic, as human behaviour significantly influences escape times. The Pathfinder® software 
has various ways of representing these critical situations. Firstly graphs with the progression of the evacuation 
(Fig. 11 on the left) and secondly the flow rates in the reduced access sections (Fig. 11 on the right); then the 3D 
graphic elaborations with crowding densities (Fig. 12 on the left) and the graphic elaborations that represent the 
so-called Level of Service (LoS), that is the criticalities in the queues, in the walkways, and in the movement on 
the stairs (Fig. 12 on the right). 


Number of Occupants in Selected Rooms Flow Rates for Selected Doors 


Fig. 11: escape simulation times in scenario VI (left); flow rates at the significant gates in scenario I (right) 
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Fig. 12: Scenario I - on the left occupant density in the critical section of the stairwell at the exit of the 8" floor - 
on the right Service level (Walking LoS) in the most critical location on the 8" floor 


These simulations also show that the presence of visitors with reduced travel speed (children and the elderly) 
strongly limits the evacuation from the building (times longer by about 30%, as shown by the comparison between 
scenarios IV and V - Fig.9). 


Even the most critical simulation among the realistic ones (scenario V - school group of 20 children all on the 8" 
floor) appears to be in compliance with the prescription of the alternative solution of the FSE (safety margin must 
be greater than 100% of the RSET). The comparison of the assumed ASET and RSET gives as results the following 
equation: 


ASET - RSET = 106% RSET. 


For the sake of completeness, an unrealistic hypothesis was also developed. This hypothesis foresees the maximum 
crowding allowed by the compliant solution, equal to 126 total occupants (scenario VT). In this case, due to the 
numerous queues and gatherings that are created above all on the 6" and 7" floors in the access door to the stairwell, 
the criterion provided for by the Code is not respected, but there is still a wide time margin: 


ASET - RSET = 71% RSET. 


Another interesting evaluation uses simulations IX and X in order to estimate the maximum crowding which allows, 
with the assumed scenarios, compliance with the criterion tmarg = 100% * RSET = 390 seconds. The number of 
occupants is obtained by interpolation of the linear function referred to the two simulations IX and X and is equal 
to an estimate of 54 people (see the graph in Figure 13). 
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Fig. 13: Estimate of maximum crowding using the escape simulation times of scenarios IX and X 
In Figure 14 there are two screen shots of the VI simulation with 126 occupants, taken from the video made with 


the simulation software, which shows the entire evacuation and gives an idea of the power of virtual representation 
of reality that Pathfinder® has. 
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Fig. 14: Examples of 3D graphic visualization of the occupants in the escape simulations processed with 
Pathfinder® (Steering mode) in the case study 


4. CONCLUSIONS 


The most innovative escape virtual modelling software, such as Pathfinder®, are able to return high-quality results 
by means of video based visualization techniques that perfectly simulate reality and give an absolutely real 
perception of what can happen in the hypothesized situation. This software is used as a planning tool of the 
evacuation dynamics of an environment and allows to explore different evacuation cases and scenarios, varying 
the parameters of the simulation and the properties of the occupants. This makes possible to calculate the 
evacuation times in the various scenarios and to highlight the most critical ones. 


The research work under this paper clearly shows hydraulic models are less in adherence to reality than behavioural 
models that are more reliable by using a scientific-predictive simulation software of the movement of people during 
the escape such as Pathfinder®. 


Behavioural models are able to demonstrate how human behaviour significantly influences evacuation times and 
are able to give quantitative outputs, evaluated from a series of assigned parameters. 


Due to their greater flexibility, these software bring out critical issues that otherwise could not be considered in 
the fire safety design phase, as for example delays in time due to congestion and queues in areas of restriction or 
intersection of multiple flows. 


Anyway, the use of virtual simulation requires new skills and greater caution, as they are highly sensitive to some 
input parameters, but they contribute to a better understanding of the phenomena as they give quickly new results 
with the variation of the individual conditions. 


Due to their characteristics, they can therefore also be of great use for simulating the evacuation of occupants in 
other emergency situations, such as earthquakes, terrorism or other cases in which people's safety is threatened. 
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QUANTIFYING THE CONFIDENCE IN MODELS OUTPUTTED BY 
SCAN-TO-BIM PROCESSES 


Shirin Malihi, Frédéric Bosché & Martin Bueno Esposito 
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ABSTRACT: 3D spatial data is increasingly employed to generate Building Information Models (BIMs) by 
extension digital twins for various applications in the architecture, engineering, and construction (AEC) sector 
such as project monitoring, engineering analyses, retrofit planning, etc. The outputted models of Scan-to-BIM 
processes should satisfy pre-defined levels of quality. In the case of emerging automated Scan-to-BIM solutions, 
users however currently need to check all generated geometry manually, which is time-consuming. What would 
help users is if the automated systems could also provide a level of confidence in the detection and modelling of 
each element. In this paper three generic indicators are defined for analysing the reliability of the generated 3D 
models: Icoverage estimates the portion of the surface of the modelled element that can be explained by the input 
point cloud. Idistance defines the closeness of the generated element models to the input point cloud. The confidence 
of the generated 3D local models can be computed by combining the two aforementioned indices. The proposed 
indicators are assessed using actual examples and comparisons are conducted between automatically generated 
3D BIM models and 3D models generated manually by a BIM modeler. 


Keywords: BIM, point cloud, confidence, indoor modelling, wall, digital twin 


1 INTRODUCTION 


Digital twinning of built environment assets is a modern data-driven process with benefits to improving 
performance and productivity within the Architecture, Engineering, and Construction (AEC) industry. It affords a 
multi-dimensional view of how an asset will perform by simulating, predicting, and making decisions based on 
real-world conditions (Boje et al., 2020). At the information (or data) level, it represents the information in a 
useful, structured form of description. At a higher level, it applies tools that make use of that information to provide 
diagnostics of why something might be happening, predict the possible future outcome, and decide the action 
based on the objectives. Optimising construction project execution (Akula et al., 2013; Bueno et al., 2018), 
building energy usage (Valero et al., 2021), and space utilization (Pan et al., 2022) are some of the numerous use 
cases of digital twins in the built environment. A built environment digital twin is commonly built from a Building 
Information Model (hereafter ‘BIM model’), which contains geometric and some semantics (such as element 
materials) that can be used to support the envisioned use cases (I. Giannakis et al., 2015). 


The generation of BIM models of new buildings is done during the delivery process with an as-design BIM model 
created during the design phase that should then ideally be updated into an as-built BIM model that incorporates 
any change made during construction. But, BIM models also increasingly need to be created for existing buildings, 
for example to plan refurbishment or enhance operation. 


In a 2020 survey conducted in 78 countries 2020, professionals asserted major benefits that BIM brings to the 
construction process from a geometric viewpoint (Rocha et al., 2021). However, the generation of BIM models is 
challenging due to the complexity and diversity of building geometry, and the possibly high levels of clutter 
existing in occupied buildings. New technologies, such as Terrestrial Laser Scanners (TLS) or photogrammetry 
(PG), now enable the acquisition of dense and accurate 3D geometric data, in the form of point clouds. Scan-to- 
BIM is the process to produce the as-built model of an asset from laser scanned or photogrammetric point clouds 
(Bassier & Vergauwen, 2020; Bosché et al., 2015a). It includes segmenting the data and generating a final 
semantically-rich 3D model (Rashdi et al., 2022). Despite the benefits afforded by those new sensing technologies, 
the generation of BIM models remains challenging due to the complexity and diversity of building geometry, and 
the possibly high levels of clutter existing in occupied buildings. 


Although Scan-to-BIM is generally a manual process in current industrial practice, there is extensive research in 
academia and industry to develop automatic Scan-to-BIM algorithms (Nikoohemat et al., 2019; Thomson & 
Boehm, 2015; Valero et al., 2021). For example, a Scan-to-BIM solution based on deep learning is developed in 
(Perez-Perez et al., 2021) for semantic segmentation. It classifies beam, ceiling, column, floor, pipe, and wall 
elements using two convolutional neural network and one recurrent neural network. 


Referee List (DOI: 10.36253/fup_referee_list) 

FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup_best_practice) 

Shirin Malihi, Frederic Bosche, Martin Bueno Esposito, Quantifying the Confidence in Models Outputted by Scan-To-BIM Processes, pp. 1137-1146, 
© 2023 Author(s), CC BY NC 4.0, DOI 10.36253/979-12-215-0289-3.113 


Some researchers (Bassier & Vergauwen, 2020) have gone beyond the problem of object detection in scan-to- 
BIM, and presented results producing BIM models in IFC from semantic information extracted from point clouds 
(IFC is an open data schema described as in ISO 16739-1:2018 (BuildingSMART, n.d.)). 


However, (Rocha et al., 2021) reported that Level of Accuracy was used in 9.2 % of the research reviewed from 
the literature, which shows a big room for development and using this concept. Guaranteeing the completeness 
and accuracy of BIM models generated through Scan-to-BIM processes (manual or automated) is an important 
issue. In previous research, authors have mainly done this manually. For example, in (Skrzypczak et al., 2022) the 
authors compare the lengths from total station measurements and the BIM model generated from Scan-to-BIM 
approach. But, comparisons like this are established to check quality manually for the purpose of academic 
assessment. 


In practice, however, the user would need to know to what extent it can be confident that a scan-to-BIM algorithm 
has produced a correct model from the input point cloud data. Without any such information, the user will need 
to check every reconstructed element against the input data and gauge correctness manually, which is a time- 
consuming, and error-prone process that partially undoes the benefits afforded by automated scan-to-BIM 
algorithms. 


Reducing this manual work could be achieved if the scan-to-BIM algorithm could also report some level of 
confidence for the modelling of each element in the outputted model. In this paper, we explore two such generic 
metrics (and a third one combining them) to automatically assess the quality of the model generated by a Scan- 
to-BIM algorithm, focusing on geometrical fitness. 


The proposed indicators of the confidence are introduced in section 2. Section 3 then reports experimental results 
on their evaluation using some real case studies. Finally, the results are discussed and avenues for future work 
suggested in Section 3. 


2 METHOD 


This section presents the method proposed to calculate the level of confidence in BIM models outputted by Scan- 
to-BIM processes. Two different indices are defined Lcoverage, Laistance, and their combination that can be computed 
for any element in the outputted model. They are detailed in the following sub-sections. 


2.1 Tcoverage 


Lcoverage iS the principal index and aims to capture how much of the modelled 3D surface of a given element in the 
outputted model is explained by the input point cloud data. One quantitative measure of this consists in 
homogeneously discretizing the element's surface and check if some points from the input point cloud lay in the 
neighbourhood and describe that discrete surface. 


A practical way to implement this is to use space voxelization. First, for each modelled element, a voxelization is 
performed in its bounding box, with a resolution 6 (e.g. 6 = 2.5cm). The set of voxels intersecting the element 
mesh is then found (we use the method described in (Open3D)); we call this set m . Then, we identify the subset 
of voxels in m that also contain points from the point cloud. Points are searched inside the voxels. The centre of 
each voxel is the base of this search. KD tree structure is used to partition this space and efficiently search the set 


of points falling within each voxel. We call this second set Yo. We then define Icoverage as: 


A 


Icoverage = Yml (1) 


where |. | is the cardinality operator. [coverage takes values between 0 and 1, with 1 indicating that the entire surface 
of the element’s mesh has matched scanned point in its vicinity, i.e. within ô distance. 
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2.2 Taistance 


Leoverage captures how much of the modelled surface is explained by a point cloud in a somewhat coarse way, and 
considers neither how closely the modelled surface matches the point cloud nor the local orientation of the points 
and the modelled surface. The metric distance aims to complement I¢ovyerage. For this, we take the set of point cloud 
that are in the voxels in Yc (called n), calculate their closest (orthonormal) distance to the element mesh, and then 


compute Taistance aS follows: 


1.0 .n1+0.5 .n2 +0.25 .n3+0.125 .n4+0.0625 ns 


laistance = Ce a ee ee s CO (2) 


where n; is the number of the n points that are within (t/ z) distance to the mesh, n, is the number of the n 
points that are between (t/ 5)6 1 and (2/ z) distance to the mesh, 73 is the number of the n points that are between 
(2/.)5 and (3/;)8 distance to the mesh, n, is the number of the n points that are between(3/.)6 and (4/.)5 
distance to the mesh, and finally ns is the number of the n points that are between (4/, z) and ô distance to the 
mesh. 

Calculation of the distance between a point of the point cloud and a triangle of the mesh satisfying two conditions 
that are the projection of the point on the plane, formed by the triangle, should be located inside this triangle, and 
additionally the distance between the point and the mesh triangle should be less than the buffer size. Afterwards 
points inside this buffer are used to compute the Taistance using the weighted average formula in Equation 2. Taistance 
also takes values between 0 and 1, with 1 indicating that all the points are very close to the mesh (within GF z) 


distance). 


3 EXPERIMENTAL VALIDATION 


The validation of the proposed method is conducted using an example Scan-to-BIM algorithm, but the method is 
applicable to the use of any other algorithm. The employed Scan-to-BIM solution was developed as part of the 
EU-funded Horizon2020 BIMERR project (Valero et al., 2021). The whole solution is semi-automatic and aimed 
at producing as-is models that contain as much information as possible to support the efficient (automated) 
development of an energy model that can be used to conduct simulations for refurbishment planning. This solution 


is divided into three components (Valero et al., 2021): 


1. The Structural Scan-to-BIM component that automatically generates an IFC model containing the main 
architectural elements (floors, walls, openings, spaces) as well as second level space boundaries; 

2. The Mechanical, Electrical and Plumbing (MEP) Scan-to-BIM component that automatically enriches 
the IFC model with elements such as radiators, HAVC units and sockets (Bosché et al., 2015b); and 

3. Scan-to-BIM Editor to manually enrich the model with wall layers, materials, material properties and 
MEP properties. 


The modelling confidence metrics proposed herein could be employed in each component, but are assessed here 
in the context of the first component of the solution, the Structural Scan-to-BIM component. 


3.1 Experimental Data 


One of the pilot sites of the BIMERR project is the two-story Kripis House located in Thessaloniki, Greece. A 
coloured point cloud of the whole house (exterior and interior) was captured using a terrestrial laser scanner Faro 
Focus 150s, and subsequently subsampled to a density of 1 pt/cm’. 
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This house is furnished, however the interior is not much cluttered. The Structural Scan-to-BIM component 
automatically delivered a 3D model of the house in IFC format from that point cloud alone. A second pilot site is 
located in Bilbao, Spain. This is a multi-story apartment building, with essentially the same layout on all floors. 
Each floor contains four flats. A storey of the building was captured fully by terrestrial laser scanning. In contrast 
to the Kripis House, the Bilbao environment is cluttered because the building was scanned when fully inhabited. 
It contains wardrobes cupboards and other pieces of furniture and personal belongings inside rooms, resulting in 
significant levels of occlusion, which challenge Scan-to-BIM processes. 


To assess the value of the proposed confidence metrics to report Scan-to-BIM confidence levels, manual scan-to- 
BIM was conducted by architects using standard commercial software and the resulting models exported in IFC. 
While those models may contain errors, they are generally good and can serve as ground truth. The validation 
then focuses on the walls, as walls are the most frequent elements in the models. The walls modelled manually 
and with the automated process are compared. 


Figure 1 shows the Spanish dataset and the corresponding output IFC model generated by the Scan-to-BIM tool. 
Figure 2 shows the same information for the Greek dataset. 


J- 


(a) (c) (d) 


Figure 1 : Spanish dataset and Scan-to-BIM output. The 3D model generated manually by the BIM modeller (a), 
plan view of one floor (b), the point cloud of one apartment (c), and 3D IFC model outputted by the Scan-to- 
BIM component for one floor (d). 


(c) 


(d) 


Figure 2: The 3D model generated manually by the BIM modeller (a), plan view of one floor (b), the point cloud 
(c), and 3D model outputted by the Scan-to-BIM (d). Stuff of indoor spaces are displayed in the point clouds. 


(a) (b) 


3.2 Results and Discussion 


First of all, we report on the overall wall detection performance of the Scan-to-BIM component. In an IFC model, 
correctly detected walls are counted as true positive (TP), non-existing detected walls are counted as false positive 
(FP), and missing walls are counted as false negative (FN). 


Table 1 summarizes the performance obtained by the automated Scan-to-BIM algorithm. The results bring to light 
the challenges faced in the case of the Spanish dataset. While the automated Scan-to-BIM tool detected most walls 
(despite the clutter, noise and furniture of the rooms) but many walls modelled by the tool that actually do not 
exist, and four walls missed. In the case of the Greek project, 100% recall is achieved and 96% precision. Two 
FPs are reported, but these include a protruding beam confused as a wall and two columns modelled as a wall. 
These can in fact be considered acceptable because the automated Scan-to-BIM algorithm assumes that the 
structure of residential buildings is composed of walls and slabs only, and thus does not explicitly look for and 
model columns and beams. 
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Table 1: Wall detection performance of the Scan-to-BIM tool. 


TP FP FN Recall Precision 
Spain dataset 60 14 4 94% 81% 
Greece dataset 45 2 0 100% 96% 


Figure 3 and Figure 4 show some of the modelling errors made by the automated algorithm with both datasets. 
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Figure 3: Modelling errors in the Spanish dataset. Errors in the manually-generated BIM model and the 
automated Scan-to-BIM algorithm are shown in (1) and (2) respectively. (a) and (b) show a wall which is 
modelled as a room with walls due to the presence of wardrobes inserted in the spaces on both sides. (c) and (d) 
depict a wall which is modelled as thick the column that is embedded in it. (e) and (f) demonstrate FN wall 
examples. (g) and (h) display an error in the modelling of thickness of walls. (i) and (j) show FP walls due to 
clutter in the room near the wall. 
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(c) (d) 


Figure 4: modelling errors in the Greek dataset. (a) and (c) show the model generated by the automated 
algorithm, while (b) and (d) show the model generated manually. In (a) and (b) a wall is modelled instead of two 
columns and a beam. (c) and (d) show a wall modelled instead of a beam. 


Figure 5 reports the coverage Values obtained for each wall in the Spanish and Greek datasets. This figure plots 
the [coverage against the difference between the thicknesses of a given wall modelled manually (ground truth) and 
the same wall modelled automatically by the Scan-to-BIM algorithm. Note that in this experiment we use 6 = 2.5 
cm. The red vertical lines in Figure 5 are inserted on the distance of 26 . This line is important because, if the 
wall is modelled at the right location but with a thickness error larger than 26, than the number of points within 
ô of each wall side, and a result the value of I-gyerage Should be much lower. Figure 6 shows the Igistance against 
the wall thickness error for the two datasets. 


|_coverage l-Coverage 


l-Coverage 


35 
Thickness error (cm) Thickness error (cm) 


(a) (b) 


Figure 5: I¢gyerage and thickness errors for two datasets of Spanish (a) and Greek (b). They are coloured based 


on their corresponding ldistance and split into three groups. The red vertical lines are inserted on the distance of 
26. 
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(a) (b) 
Figure 6: Igistance and thickness errors for two datasets of Spanish (a) and Greek (b). 


Looking at Figure 5, one can see a trend where leoverage is lower for walls with greater thickness error (in 
particular >26). This implies some correlation between [overage and the confidence in the modelling quality. 
Nonetheless, some outliers do exist and good confidence seems to be only achieved for very high values of 
Icoverage FOr Igistance (Figure 6) a similar, but less positive trend can also be observed, as more obvious outliers 
can be observed. 


Figure 7 shows coverages and thickness errors of seven sample walls which were selected from different parts of 
Figure 5. (a, b) show examples where Icoyerage >0.5 and thickness error < 26. (c, d) show example for which 
[coverage 18 lower with values closer to 0.5, with one wall having thickness error >26 and the other just below 26. 
The average value of I¢oyerage in both cases are due to the fact that one of the faces of the walls is detected 
correctly, but the other one is wrongly detected. In the case of (c) this is due to a structural column (which the 
algorithm fails to detect) that leads to a gross over estimation of the width of the wall which subsequently leads 
to only very few points being matches to that face of the wall. The two large undetected windows also impact 
Icoverage- In (d), the second face of the wall is also wrongly modelled because the algorithm wrongly selected the 
curtains as the boundary for that wall. This results in lower (although not insignificant) thickness error which is 
still high enough to impact [coverage The conclusion is that, in both cases, [coverage rightly represents some level 
of error (for one face of the walls). (e, f) show walls for which [overage <0.5 and thickness errors > 2 6. These 
walls similarly have one face of the wall that is wrongly modelled, which is the source of the thickness error and 
implies that [coverage COouldn’t be higher than 0.5, as in the examples (c, d). But, (e, f) additionally contain 
undetected large windows and many occlusions due to desk, frame and mirror. Finally, (g) shows a wall that has 
a very low [coverage Value (0.23) despite a small thickness modelling error. This may first appear to show a 
weakness of the proposed [coverage index. But, actually, in this case, while the wall is modelled with only 2cm 
thickness error, both faces of the wall are wrongly modelled and the wall end up looking like it was modelled at 
a location 4cm away from its true location. Therefore, Icoverage tightly responds to this important modelling error. 


(a) Tcoverage=-76, Ethickness = 1.8cm (b) Tcoverage.=-68, Ethickness = 1.8 cm 
(c) Tcoverage=-9 1, Ethickness = 27.1cm (d) Icoverage =.63, Ethickness = 4.5 cm 
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(e) Tcoverage=-41, Ethickness = 11.3 cm (f) Icoverage=-35, Ethickness = 12.6 cm 


(g) Tcoverage-=-23, Ethickness — 2.0 cm 


Figure 7: Icoverage and thickness error of sample walls 


Overall, higher [coverage, and to a lesser extent Igistance, Show some level of correlation with smaller thickness 


modelling errors. However, in the presence of occlusion and obstruction from items such as cabinets, wall 
decorations, or curtains, coverage 1s less reliable as a determinant factor for confidence of the automated Scan-to- 


BIM tool. To has achieve a higher correlation with the confidence of modelling, we can first look at combining 
these indices as: 


Iconfidence = Icoverage 7 laistance (3) 


The results for [confidence are shown in Figure 8. This shows that confidence has a slightly improved correlation 
with the confidence level of modelling. 
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Figure 8: Iconfidence against the thickness errors for two datasets of Spanish (a) and Greek (b). 


A second observation is that, since many of the modelling thickness errors arise from the presence of pieces of 
furniture or decoration in close proximity to the walls as well as the confusion of the algorithm between walls and 
columns, it is suggested, for future work to explore the use of some point cloud semantic segmentation algorithm 
(e.g. (Armeni et al., 2016)), which could provide further support during the modelling as well as to refine the 
confidence index by ensuring that wall elements are indeed modelled with points that are mostly labelled as being 
in the “wall” category. 
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4 CONCLUSION 


Users of Scan-to-BIM algorithms and generally digital twins should be provided with reliable metrics of 
confidence for geometric modelling; hence they do not need to check everything manually to ease quality control 
and corrective works. 


For this purpose, we introduce indices related to coverage and distance. These indices use information to analyse 
estimate confidence of the modelling tool. Coverage and distance information of the point cloud and IFC models 
are used to determine the consistency of the modelling. The major indicator is defined based on the coverage. 
Information of coverage provides local qualification of the modelling. The coverage index shows the best results, 
but, However due to obstruction, and the presence of furniture and decorative items in close proximity of or onto 
walls, only fairly high values of this index (>0.8) can be used to have high confidence in modelling. The distance 
index can be combined with to it and to improve its results, but still further work is necessary to improve the 
reliability of these indices. Semantic segmentation could be employed to detect different elements such as desk, 
mirror, frame, cupboard, as well as distinguish columns from walls, which would then be removed before 
modelling and/or accounted for in the calculation of a confidence index. 
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ABSTRACT: The objective of this paper is to develop a semi-automatic method for constructing a practical finite 
element model from point cloud data of an entire span of a through-type steel truss bridge. In the first step, we 
introduced practical finite element models for truss bridges based on structural experiments and numerical 
analyses of a sway bracing located at the end support. We also proposed a basic method for semi-automatically 
constructing a finite element model of a sway bracing using point cloud data. This method was then extended for 
an entire of steel truss bridge. The point cloud data is converted to individual data structures which, in turn, are 
connected to construct a whole structure. The main members, such as upper chords, lower chords, and diagonals, 
are converted to fiber-based models by automatically creating central axis lines and cross-sections from the point 
cloud. The slab is converted to shell models by obtaining surfaces and thickness from the point cloud. The 
effectiveness of the proposed method was confirmed by comparing the analysis results from the finite element 
model manually created from the design drawing (drawing-model) with those obtained from the model generated 
by this method (point-cloud-model). The proposed method is more efficient than reading drawings and creating 
the models manually, and it was confirmed that the point-cloud-model shows response values close to those of the 
drawing-model within the design load. However, the reproducibility of the response values with more than the 
design load remains an issue, which can be solved by tuning plate thickness. 


KEYWORDS: Point Cloud, Fiber-based model, Steel Truss Bridge, Structural Analysis Model, Semi-Automatic 
Method 


1. INTRODUCTION 


A vast number of existing bridges are rapidly aging. Since it is not practical to rebuild all of them at the same time, 
strategic renewal through life cycle extension is required. To extend the life cycle of bridges, quantitative 
evaluation of the residual load capacity is being promoted through numerical analysis. The accuracy of the 
analytical model, such as finite element model configuration (dimensions, materials, and boundary conditions) has 
been verified through structural experiments and is now being realized with high reproducibility (Magoshi et al. 
2014), but the efficiency of the generation method still remains an issue. 


Construction of a finite element model requires acquisition of member dimensions of a target structure. However, 
in cases of old bridges, as-build drawings are often unavailable. In addition, conditions of bridges inevitably 
changed since its construction due to various factors. Therefore, it is necessary to construct a finite element model 
based on dimensions data instead of relying on drawings, but manual measurement is time-consuming and prone 
to various human errors. 


Therefore, a method to efficiently construct a finite element model from point cloud data, which can efficiently 
reproduce the 3D shape of an object, has begun to attract attention. Some existing methods (Suzuki et al. 2019 and 
Nakamizo et al. 2022) convert point cloud data to a finite element model by shell or solid elements, but the shell 
or solid element models are not practical because of their computational burdens. In addition, such methods are 
only applicable to simple structures like simple beams and not to usual structures consisting of multiple members. 
Therefore, it is necessary to apply structures with multiple members connected to each other and to convert to 
fiber-based models used in practice. Fig. 1 shows a type and outline of finite element models. 
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Fig. 1: The types of finite element models used in structural mechanics 


The authors have developed a method for constructing a fiber-based model from point cloud data and applied a 
case of sway bracing located at the end support of a steel truss bridge in the structural experiment (Hidaka et al., 
2023). The numerical result based on the fiber-based model constructed by the proposed method reproduced the 
experimental result very well. It is interesting to note that the model yields a better result than an analytical model 
manually constructed from the drawings. The point cloud based model can reflect accurately the state of a structure 
as it is. A real structure cannot avoid initial imperfections within tolerance. 


In this paper, the modeling method is extended to a whole truss bridge. A semi-automatic procedure is proposed 
and developed to construct a finite element model from the point cloud data of the entire side span of a through- 
type truss bridge. To verify the validity of the model, a model created manually from drawings was also prepared, 
and the response values were compared under the same loading conditions. 


2. CASE STUDY 


A two-span continuous through-type truss bridge in Aichi Prefecture, Japan, was measured in March 2023. A 
photograph of the bridge is shown in Fig. 2(a). The bridge length is 136.9m with a span length of 2@67.9m. A full 
width is 14.3 m with sidewalks on both sides. The effective width of the roadway is 7.5 m and that of the sidewalk 
is 2.0 m without width widening. A slab thickness is 200 mm with a pavement of 80 mm thickness (roadway) and 
of 30 mm thickness (sidewalk). In addition, the bridge is straight and has a symmetric cross slope. A general bridge 
drawing is shown in Fig. 2(b). 


In acquisition of point cloud data of the entire P4-A2 span, the stationary laser scanner (Leica RTC360, resolution: 
3mm@10m, accuracy: 1.9mm@ 10m) was used from the beneath of the girder and from the road surface. The 
number of points in the point cloud was about 1 billion. Furthermore, the handheld laser scanner (HandySCAN 
BLACK™ Elite, resolution: 0.05 mm@30 cm, accuracy: 0.025 mm@30 cm) was used to measure the detailed 
geometry of the parts of lower chords, braces, main girders, and lower lateral bracing. The lower chord, main 
girder, and lower lateral bracing were measured for the member closest to the abutment due to on-site restrictions, 
and two braces (two different cross-sectional shapes) were measured for the member closest to the pier (fixed 
bearing). The number of points in each point cloud was approximately 0.3 to 1.5 million. The point cloud data is 
shown in Fig. 2(c). The total measurement time was approximately 3 hours. The coordinate system was set so that 
the x-axis is along the longitudinal direction, the y-axis is transverse direction, and the z-axis is along the height 
direction. Table 1 shows the dimensions of each member components. Measured values, data in the as-built 
drawings and data by the handheld laser scanner are given. 


The bridge is a two-span continuous bridge, but due to on-site restrictions, only P4-A2 span were measured; to 
interpolate the parameter of P3-P4 span, the altitudes of fulcrum at both ends were measured with a total station. 


3. PROPOSED METHOD 


A computer program is developed to generate fiber-based models for numerical analysis from point cloud data 
obtained in Chapter 2. To construct a fiber-based model, nodes along an axis passing through a center of each 
member, elements connecting the nodes, a cross-sectional geometry of each element, and other material and 
loading conditions are required. 
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Table 1: Arrangement of measured dimensions of bridges 


(Movable Bear ing) 


(c) Measured point cloud data 


Fig. 2: The case study bridge and measured point cloud data 


(b) General bridge diagram 


Portions measured in details 
by a handheld laser scanner 


Lower chord 


Brace (Compression) 


Brace (Tension) 


Main girder 


Lower lateral bracing 


Cross section 


: as-built drawings, M: manual measured, H: 


handheld laser scanner, U: upper, L: lower, 


3.1 Creating nodes and elements for a fiber-based model from a point cloud 


D M H D M H D M H D M H D M H 
380 | 379.1 | 380.8 
Flg. 350 | 351.0] 351.1 | 350 | 352.0 | 351.2 | 230 | 230.0 | 230.2 | 360 |361.0 | 360.9 
Length 460 | 462.3 | 461.8 
Web 380 | 378.0 | 379.1 | 340 | 339.1 | 340.7 | 322 | 324.0] — | 1000 {1002.3} — 180 | 180.3 | 180.6 
9 = 8.5 
Thick Flg. 19 | 19.4 | 19.3 | 22 | 22.5 | 22.4 | 14 | 14.2 | 14.2 | 19 | 19.5 | 19.3 
10 | 11.1 | 10.6 
ness 
Web 11 11.3 | 10.9 | 19 E = 22 — |223 9 — 9.4 16 | 17.0 | 16.5 
(Unit: mm) 


-: not measurable 


In the first step, nodes and elements are created from point cloud data. In this step, taking advantage of the fact 
that braces of the truss bridge are connected to many members (such as upper and lower chords, upper and lower 
lateral bracings, and cross beams), nodes, the central axes of the braces are extracted from the point cloud data of 
the entire bridge, and the nodes and elements of the fiber-based model are created by making the points of 
intersection between the central axes of adjacent braces as grid points. 
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First, point cloud data of the entire bridge (Fig. 3(a)) is sliced along a longitudinal direction (in this case, the x- 
axis direction) from an abutment position at small intervals. A cross-sectional point cloud is obtained as shown in 
Fig. 3(b). By grouping points in the cross-sectional point cloud based on the Euclidean distance (Ester et al., 1996), 
point groups of cross sections of each member (such as upper and lower chords) are separated. Centroids of these 
cross sections correspond to a center axis of each member, and if continuing to slice, candidate points for the center 
axis of the members are created as shown in Fig. 3(c). If location of centroid is obtained by a simple average 
method, it is biased by the density of point cloud data. To avoid the bias, cross-sectional point cloud is converted 
to polyline with the convex hull (Preparata and Hong, 1977) method and a centroid is obtained using an image 
processing algorithm. 


Next, central axes are obtained from the candidate points for the center axis using the RANSAC method (Fischler 
and Robert, 1981). In the above procedure, however, central axes of other members than braces are inevitably 
included. The central axes of braces can be extracted by using a threshold value method based on the fact that they 
extend in the x- and z- directions. To create intersections of the central axes of the adjacent braces, the central axes 
are sorted as x-coordinates of the center points in the axes. When the two lines are in a twisted position, the 
intersection is defined here as a midpoint of a line segment that is orthogonal to two lines and has the shortest 
length. After creating the grid points, nodes are sampled at equally spaced intervals along the line segment 
connecting the two grid points, with the specified number of nodes. For the upper lateral bracing, its grid points 
are located at the center of the upper chords. 


Since grid points, nodes, and elements of main girders cannot be created from grid point of braces, they are created 
additionally. Fig.4 shows a summary to create grid points of main girders. A cross-sectional point cloud 
perpendicular to longitudinal direction at a location of a cross beam is obtained. Points within 1 m of the upper 
side of a line connecting grid points of lower chords (bottom horizontal line in Fig. 4) are extracted and divided 
point groups of main girders and others based on Euclidean distance. To extract main girder lines, the divided into 
point groups are converted into direction vectors by using principal component analysis and only the z- direction 
vectors are extracted as main girder lines (blue vertical lines in Fig. 4). Intersections of main girder lines and a line 
connecting grid points of lower chords are grid points of the main girders. To account for the possibility that grid 
points cannot be created at a few of cross beam positions, the average value of the created grid points was 
calculated. After creating grid points, if there is a cross beam position where the grid points cannot be generated 
by the above procedure, the calculated average value is applied. 


=) = (i) 
Divide segments based on = 
eucledian distance (g) 


Calculate centroid 
from each of segments 


E stemi (E) 


(b) Cross-sectional point cloud along longitudinal 
direction 


Detect lines of braces 
based on direction vector 
calculate from RANSAC 


(c) Centoroids obtained (d) Detected lines of braces and grid points 
from cross-sectional point clouds 
Fig. 3: Creating grid points of braces by using cross-sections and centroids 
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grid point b 
(lower chord)? grid point 


lower, chord} 
— 


Fig. 4: Creating grid points of main girders (cross mark is a grid point) 


3.2 Obtaining a cross-sectional geometry of each member for a fiber-based model from 
a point cloud 


In the second step, once the nodes and the elements have been created, the next step is to obtain a cross-sectional 
geometry to be applied to the elements. In finite element models, a cross-sectional geometry is represented by a 
collection of rectangles as shown in Fig. 5. The rectangle is defined by the coordinates of a start and an end point 
and its thickness. The coordinate system of the cross section must be converted to a two-dimensional coordinate 
system (such as u-v coordinate system) with the origin at the position through which the element axis lines pass. 


Cross-sectional geometry is obtained using cross-sectional point cloud perpendicular to a direction vector of the 
element passing through the midpoint of the element. Because point cloud measured by the stationary laser scanner 
(accuracy: 1.9mm@10m) is difficult to ensure accuracy of plate thickness, representative cross-sectional geometry 
is obtained from the point cloud data measured by the handheld laser scanner (accuracy: 0.025 mm@30 cm), and 
is applied to all the elements of the corresponding member. In addition, for the upper chords, the upper lateral 
bracing, and the cross beams which could not be measured by the handheld scanner due to on-site restrictions, the 
vertical and horizontal scales of cross sections of other members with similar shapes were adjusted. Specifically, 
the cross-sectional geometry of the upper chords is a vertically inverted that of the lower chords, the upper lateral 
bracing applies the I-section of the braces, and the cross beam applies to the cross-sectional geometry of the main 
girder. Table 2 shows the relation of them. For members whose entire surface could not be measure by the handheld 
laser scanner due to on-site restrictions, the symmetric center point of the cross-section at the same position was 
obtained from the point cloud data collected by a stationary laser scanner. Subsequently, the cross-sectional 
geometry was determined by duplicating the measured portion through rotational symmetry, utilizing the point 
symmetry of the cross section. In the following, the methods of obtaining cross-sectional geometry are explained 
according to the type of geometry. 


Table 2: The relation of applying cross-sectional geometry of members that could not be measured by the handheld 
laser scanner 


Unscanned member Upper chord Upper lateral bracing Cross beam 


Referenced member Lower chord Brace (Tension) Main girder 


Cross section 


3.2.1 Open cross-section (I-shape and T-shape) 


Midpoints of all point pairs in a cross-sectional point cloud are created (Fig. 6(a)), and the midpoints that are not 
on the cross-sectional point cloud are extracted as points of candidate centerlines. These are converted to straight 
lines by using RANSAC (Fig. 6(b)), intersection points of these lines and the cross-sectional point cloud are 
starting and ending points (Fig. 6(d)), and its thickness is obtained by doubling an average of the shortest distances 
from the cross-sectional point cloud (Fig. 6(c)). 
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Fig. 5: Parameters of section for 


fiber-based model Fig.6: Obtaining parameters of open-cross section 


3.2.2 Close cross-section (quadrangle-shape) 


The algorithm in 3.2.1 cannot be applied to a square cross section because back sides of plates cannot be measured. 
Therefore, as shown in Fig. 7(a), cross-sectional geometry is constructed by finding points at corners. Since corner 
points and joint positions are rounded, as shown in Fig. 7(b), corner points can be found by taking a local area at 
all points of the cross section and extracting the area where the radius of the circle is smaller when fitting it to a 
circle. For areas where thickness cannot be measured, such as the web and the upper flange of the lower chord, 


general plate thicknesses are used. 


Enlarged 
è : points representing y 
sectional geometry ee 
(a) Cross-sectional geometry of quadrangle-shape (b) Finding corner points by fitting to circles 


Fig.7: Getting parameters of close-cross section 


3.3 Creating a slab model as a shell model 


Only in a case of modeling for a slab, to reproduce load sharing of live load accurately, a shell model is constructed. 
It is connected to main girders and cross beams in the fiber-based model with springs. As shown in Fig. 8, a position 
of a slab is determined by finding the difference in z-coordinates between the main girder grid points and a 
centerline of the slab, and offsetting the z-coordinates from the main girder grid points by that value. In the same 
way as in Sec. 3.2.1, a centerline of a slab is obtained by finding midpoints of all point pairs in a cross-sectional 
point cloud of a slab and extracting points of a candidate centerline that are not on the cross-sectional point cloud. 
This is converted to a straight line, and the average of the shortest distances from the cross-sectional point cloud 
of the slab is doubled to obtain the slab thickness. As the slab thickness obtained in the above procedure includes 
the pavement portion, the thickness after substracting a typical pavement thickness of 80mm is used. 


grid point 8 j 
(lower chord)e > grid point 


\ ower chord) 


Fig. 8: Obtaining offset and thickness parameter of a slab 
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3.4 Implementation 


The proposed methods were applied to the point cloud data in Chapter 2. Due to memory limitations, the point 
cloud data measured by the stationary laser scanner was down-sampled from approximately | billion points to 
approximately 200 million points (3 mm pitch). Only the nodes and elements in P4-A2 span were created from the 
point cloud data, and those in P3-P4 span were extrapolated by duplicating the linearly stored height of the grid 
using the altitudes of fulcrum difference. Table 3 shows the development environment. 


The program that implements the proposed method is divided into several phases because it requires several 
manual operations in the process. The program flow is shown in Fig. 9. 


Table 3: Development Environment 


CPU Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 2.39 GHz (2 processors) 
Memory 64GB 
GPU NVIDIA GeForce RTX 3080 (10GB) 
OS Windows 10 Enterprise 22H2 64bit 
Development Environment Microsoft Visual Studio Community 2022 64bit 
Library Point Cloud Library (PCL) 1.12.0 64bit (Rusu, 2011), OpenCV 4.5.5 64bit 
Programming Language C++ 
Structural analysis software SeanFEM (Earthquake Engineering Research Center Inc., 2007) 
Point cloud scanned by Point cloud scanned by 
stationary laser scanner handheld laser scanner 
yoy 


Program A (Sec. 3. 1) 


Representative section ' ' Representative section 
ı Point clouds of members ; ı Point cloud of the bridge , 
Program B (Sec. 3. 2) Program C (Sec. 3. 3) 
| Parameters of ; | Parameters of sections | Parameters of A 
ı [node and element |;  *-------p------' | offset and thickness , 
ı (excluding of slab) , 1 of a slab f 


Program D 


Finite element model 
(Nodes, Elements, Sections) 


Material conditions 


[e Boundary conditions 
y Load 


oa 
(Manual input) 


Analysis software 


Fig. 9: The flow of implement for the proposed method 


First, the nodes and the elements excluding for slab are obtained by inputting the point cloud measured by the 
stationary laser scanner and the handheld laser scanner into Program A, which performs the procedure described 
in Sec. 3.1. To obtain the cross-sectional geometries, the representative cross-sectional point cloud of each member 
is output. Furthermore, to obtain the nodes and the elements of the slab, the representative cross-sectional point 
cloud of the bridge perpendicular to longitudinal direction is output. 


Next, in order to obtain the cross-sectional geometry, the missing part is manually compensated, and the scale is 
adjusted to apply to other members. After that, they are input to Program B, which performs the processing 
described in Sec. 3.2. 


After the slab points are manually extracted from the cross-sectional point cloud data, they are input into Program 
C that performs the procedure described in Sec. 3.3. 


1153 


CONVR 2023. PROCEEDINGS OF THE 23°° INTERNATIONAL CONFERENCE ON CONSTRUCTION APPLICATIONS OF VIRTUAL REALITY 


Finally, the output results of Programs A, B, and C are input to Program D, which outputs the finite element model. 
Once the nodes, elements and cross-sections are obtained in this way, the material and boundary conditions are 
manually specified and entered into the analysis software. 


Excluding manual operations such as preliminary down-sampling and correction of missing points in the cross- 
sectional point cloud, it took 35 minutes to input the stationary laser scanner point cloud, create the KD search 
tree, slice the cross-section in the 100 mm pitch in the longitudinal direction, divide them using the Euclidean 
distance, and obtain the centroid of each member cross-section. In addition, it took 1 minute to extract the central 
axis of the braces from the point cloud of the centroid, 6.5 minutes to obtain the grid point position of the main 
girder, and | minute per section to obtain the end point and plate thickness of the cross section. The remaining 
processing was completed in less than 1 second. 


The generated finite element model is shown in Figs. 10. (a) is the framework element, and (b) is the cross-sectional 
geometries reflected on it. 


(a) Framework element (b) Cross-sectional geometries reflected on 
Fig. 10: The analysis model from point cloud by using the proposed method 


4. RESULTS AND DISCUSSION 
4.1 Results of repeatability analysis 


The response values of nonlinear analysis were analyzed by applying progressively increasing dead and live load 
according to the Japanese Specification for Highway Bridges (Japan Road Association, 2017). Dead load is loaded 
to whole of the bridge and live load is loaded as Fig. 11. The material conditions and reinforcement of the slab 
were set based on actual bridge design experience. To verify the validity of the model, the same loading conditions 
were applied to a model created manually from drawings. Hereafter, the finite element model generated by the 
proposed method will be referred to as the "point-cloud-model" and the model from the drawing as the "drawing- 
model”. 
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l 5 
T 2 
a oaks ak p LBZ] 3SKN/m? | 
x 13.5kKN/m?  6.75kN/m?  3.SkN/m?  1.75kN/m? Total: 4720kN 
Fig. 11: Live load 


The displacement of the point-cloud-model was close to that of the drawing-model within the design load that is 
the sum of dead and live load. Additionally, when examining the strain contour and deformation diagrams for the 
design load (Fig. 12), the deflection of the bridge exhibited a nearly identical behavior between the two models. 
However, with a load larger than the design load, the response value of point-cloud-model was different to the 
drawing-model (Fig. 13(a)). As the deformation diagram of the point-cloud-model for the 1.3 times of the design 
load (Fig. 13(b)), one upper chord at left-side in the center span of P4-A2 was extremely deflected. Consequently, 
a discernible variance arose between the response of the point-cloud model and that of the drawing model. 
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Fig. 12: Strain contour and deformation for the design load 
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represent the yield points of individual members) 
Fig. 13: Response value for the 1.3 times of the design load 


4.2 Discussion 


To improve maintenance efficiency, a method was developed to semi-automatically construct a fiber-based model 
(the slab is a shell model) that reproduces the structure of a truss bridge from point cloud data. When the drawing- 
model was created manually from drawings, it took several days to read the dimensions and connections of the 
members and input them into the software. In particular, the most time-consuming step was the input of cross- 
sectional geometries due to variations in plate thickness. The point-cloud-model, on the other hand, took about 
three hours to measure the point cloud data and less than one hour to generate the model from the point cloud data. 
Other manual work, such as processing noise and filling missing portions in the cross-sectional point cloud, can 
still be completed in 12 to 24 hours. The proposed method is expected to contribute to efficient maintenance and 
management. 


However, there are several issues related to the accuracy of elemental axis construction and the limits to applicable 
structures. 
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The point-cloud-model demonstrated a response closely aligned with the drawing-model under the design load. 
However, deviations emerged with a load larger than the design load. One plausible explanation stems from the 
inherent variability in plate thicknesses and member dimensions among upper chords, lower chords, braces, and 
lower lateral bracings unlike the brace panel in the previous research (Hidaka et al., 2023). The point-cloud-model 
encompasses disparities in cross-sectional geometries, as the stationary laser scanner's limited measurement 
precision and the handheld laser scanner's constrained range preclude the complete capture of all such geometries. 
Notably, the upper chord's cross-sectional geometry, pivotal for truss bridges experiencing significant forces, 
eludes handheld scanner measurement and significantly influences analytical outcomes, as evident in Fig. 13(b). 
Furthermore, the substitution of a representative, thinner cross-sectional geometry for the braces at the span's center, 
as described in Fig. 14(a), led to a marked reduction in overall bridge stiffness (Fig. 14(b)). These findings 
underscore the notion that a uniform approach to determining cross-sectional geometries across members with 
varying plate thicknesses sacrifices precision. Thus, an enhanced methodology is imperative to refine cross- 
sectional geometries and achieve heightened accuracy. 
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Fig. 14: The case of braces with thinner cross-sectional geometry 


(a) Cross-sectional geometry 


In the previous research (Hidaka et al., 2023), the fiber-based modeling approach was employed for simulating 
sway bracing. This involved adjusting node positions to accurately replicate member bending by utilizing centroids 
within localized regions. The incorporation of this technique successfully accounted for initial irregularities, 
resulting in a response that closely aligned with the actual structural behavior. However, in the present case, this 
method couldn't be employed due to its limited accuracy in centroid generation. This limitation stemmed from 
numerous factors, including the presence of numerous gaps in the point cloud data as well as the inclusion of 
extraneous points within local regions intended for centroid calculation. As such, there exists a pressing need to 
enhance the methodology in order to effectively surmount the aforementioned challenges. The proposed method 
is based on an algorithm that uses advantage of the fact that the bridge is a straight and has no widening and same 
truss spanning. It is required to improve the method to extend to curved or width extension bridges. A longitudinal 
direction of curved bridges and width widening may be obtained by tracing line of curb stones and white lines, for 
example. In addition, it may be effective to supple missing geometries and dimensions by using information of 
similar bridges. 


5. CONCLUSIONS AND FUTURE WORK 


In this paper, a method is proposed to semi-automatically generate a finite element model for practical use from a 
point cloud data for the entire of steel truss bridge without using drawings. The findings are as follows: 


@ The method to accurately obtain the geometry and member dimensions of the entire bridge was proposed by 
using stationary and handheld laser scanners. 
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© A program to construct a finite element model from a point cloud data was developed and its usefulness was 
demonstrated. 


@ The proposed method showed the response value is close to the drawing model within the design load, but it 
is required to improve detailed analysis results with more than the design load. A possible reason for this is 
that the representative section was uniformly determined for members although they have non-uniform 
thickness. 


In the future, we will address the issues of dealing with various plate thickness and curved bridges, and 
interpolation of unmeasured points using other dimension data and other parameters, such as road alignment. In 
particular, for applying appropriate cross-sectional geometries, the following two proposals are seen to be effective 
and will be implemented. 


© A method to determine appropriate members for measurement by using a handheld scanner will be developed. 
It is useful to estimate the members that are subjected to large cross-sectional forces according to the 
structural characteristics of a bridge type and span. Additionally, numerical analysis enables us to determine 
the members whose cross-sectional dimensions require high accuracy, such as the upper chords and diagonals 
in this particular project. 


@ Weare in the process of devising a methodology to extrapolate cross-sectional geometries from point cloud 
data measured by a stationary laser scanner. Despite the inherent precision limitation to a few millimeters 
associated with the stationary laser scanner's point cloud, it facilitates the extraction of crucial details 
concerning member classification, cross-sectional profiles, and member lengths. Leveraging this data, an 
effective approach to determine the suitable plate thickness can be formulated. 
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ABSTRACT: The operational phase of a real estate asset accounts for approximately 80% of the overall 
investment and management costs throughout the entire life cycle of the building, and the activities of space 
management and monitoring of building components and systems play a crucial role in ensuring the well-being 
and health of users. The AECO (Architecture, Engineering, Construction, and Operation) industry is transitioning 
towards a new framework governed by data-driven processes. In this context, Building Information Modeling 
(BIM) can support the utilization of big data generated throughout different stages of the building's life cycle, 
thereby establishing itself as a dynamic repository of information at the center of a constellation of systems used 
by a Facility Management body to achieve specific objectives (such as CAFM, ERP, BMS, etc.). 


The proposed study aims to define a processing framework for the collection and management of data aimed at 
the implementation of DT of existing real estate assets, created based on the integration between BIM platforms 
and IoT technology oriented to subsequent developments of big data analytics and AI applications. The objective 
is to support in the operational phase of buildings the decisions of the various operators involved in planning 
scheduled and/or corrective maintenance actions and to generate content, recommendations, best practices by 
formulating predictive analysis on managed assets. In particular, a critical analysis is made of the various 
approaches available for the definition of an IT architecture to support IoT reference models, which will find 
application in the monitoring of some existing assets of the University of Florence's real estate managed by the 
Building Area, digitally implemented on a BIM platform. The contribution is part of a broader research activity 
carried out as part of the PNR Project, "BIM2DT. BIM-to-Digital Twin: information management to support 
decision-making in the building life cycle." 


KEYWORDS: Facility Management; HBIM; Digital Twin, IoT. 


1. INTRODUCTION 


For several years now, the digitization of the construction process has been a primary objective of governments, 
organizations and in general stakeholders in the AECO sector (Daniotti et al., 2022) to reconfigure a production 
sector that, with different graduations in different countries around the world, lags historically behind the 
manufacturing sector in gathering the benefits that technological and process innovation through Information 
Technology can return in terms of efficiency, competitiveness and economic, environmental and social 
sustainability. 


In particular, the introduction of regulations at the national and international level regarding the information 
management of the construction process with Building Information Modeling (BIM) tools and methodologies in 
public supply, service, and works contracts has necessitated the redefinition of structured and planned flows of 
data and information exchange between the various stages of the delivery and operation process of real estate 
assets. More recently and in line with the Industry 4.0 approach, many organizations and business operators are 
orienting their decision-making processes toward data-driven strategies especially in the management of complex 
estate assets (Méda et al., 2021). In fact, the operation phase engages about 80 percent of the total investment and 
management costs of a building's life cycle (Volk et al., 2014), and the management and monitoring activities of 
spaces, building components, and facilities play a decisive role in ensuring the well-being of users and health and 
safety in living and working places. The use of BIM in Facility Management for an organization therefore becomes 
a key step in improving and optimizing the operation and maintenance activities of managed assets and, in a 
broader perspective, in contributing significantly to achieving the goals set by the European Green Deal and 
Sustainable Development Agenda 2030 (Ciribini et al., 2016; Mirarchi et al., 2018). 


Currently, information management of an existing asset in the operation phase is conducted through different tools 
and platforms, including Computerized Maintenance Management System (CMMS), Computer-Aided Facility 
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Management (CAFM), Building Automation System (BAS), Integrated Workplace Management System (IWMS). 
However, there is a lack on creating an integrated system to manage multiple information distributed on different 
databases. In this context, BIM can act as a centralized repository that can hold all the information about the 
building and its surroundings (Qiuchen Lu et al., 2019). In fact, much of the information produced in the design 
and construction phases is lost and only a small portion is passed on to the next operation phase usually in the form 
of spreadsheets with 3D information added according to client specifications. This loss of information can be 
avoided, or at least reduced, through the use of standardized and shared practices and procedures among the 
different actors in the supply chain, but also through open and interoperable exchange formats (Patacas et al., 
2015). This type of information, which we can define as “static”, can also be combined with “dynamic” 
information from the collection of heterogeneous data, both in terms of protocols and formats, during the use and 
management phase of the real estate. 


In particular, the rapid spread of the Internet of Thing (IoT) in daily activities has made available real-time data 
and information on many aspects of the operating conditions of buildings and their surroundings, which can be 
usefully employed by facility managers to improve building performance management, reduce energy 
consumption, optimize routine and extraordinary maintenance operations and increase user satisfaction and well- 
being. All this leads to the creation of a “digital twin” (Boje et al., 2020) of the building aimed at the management 
of existing assets, enabling a two-way exchange of information between the physical and digital worlds. 


This contribution represents a first outcome of a wider research activity within the PNR Project, "BIM2DT. BIM- 
to-Digital Twin: information management to support decision-making processes in the life cycle of buildings", 
which intends to define an operational framework for the collection and management of data aimed at the 
implementation of DTs of existing real estate assets, created on the basis of the integration between BIM platforms 
and IoT technology oriented to subsequent developments of big data analytics and AI applications. The objective 
is to support the decisions of the various operators involved in the planning of scheduled and/or corrective 
maintenance actions in the operational phase of buildings, and to generate content, recommendations, and best 
practices by formulating predictive analyses on managed assets. In particular, it is proposed a critical analysis of 
the various approaches available for the definition of an IT architecture that supports an IoT reference model, 
which is structured according to protocols parallel to those that today support the Internet infrastructure. The IoT 
model thus identified, will find application in some existing assets of the University of Florence’s real estate 
managed by the Building Area, digitally implemented on a BIM platform. 


Fig. 1 — Conceptual outline of the BIM2DT research project theme 


2. INTERNET OF THINGS FOR THE MANAGEMENT OF BUILT ASSETS 


Buildings are complex systems, from which a large amount of data (environmental, comfort, security, ...) can be 
collected by Building Management Systems (BMS). This data, which until now was stored locally, is moving to a 
cloud environment, increasing the adoption of IoT solutions. The BIM methodology itself is evolving with web- 
based and data-driven solutions. A major problem is the heavy reliance on native solutions that confine facility 
managers to proprietary platforms and do not provide the opportunity to develop evaluations across multiple 
systems. The information associated with BIM elements is also usually unusable outside modelling environments, 
and only a few applications have started to integrate data from different types of sensors. With the increasing 
demand for smart buildings, there is therefore a need to develop applications capable of accommodating 
heterogeneous data, which are transmitted according to different formats, protocols and languages. The starting 
point, however, must be an understanding of the infrastructure that revolves around the world of the Internet of 
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Things and how this can be used to improve and implement the semantic content of information models produced 
using BIM methodology. 


2.1 Internet of Things Model 


The term Internet of Things (IoT) was first coined in 1999 by British engineer Kevin Ashton, co-founder of the 
Auto-ID Center at MIT (Ashton, 1999). In 2001, the MIT Auto-ID Center presented its vision on the topic of IoT, 
which the International Telecommunication Union (ITU) later drew on in its 2005 Internet Report. The latter 
defines the term IoT as “a global infrastructure for the information society, enabling advanced services by 
interconnecting (physical and virtual) things based on existing and evolving interoperable information and 
communication technologies ” (Overview of the Internet of Things, 2012). The Internet Society instead speaks of 
IoT technologies as “scenarios where network connectivity and computing capability extends to objects, sensors 
and everyday items not normally considered computers, allowing these devices to generate, exchange and consume 
data with minimal human intervention”. Beyond the definitions, of which there is still no universal one (Rose et 
al., 2015), we can conclude that around the topic of IoT, new scenarios are developing concerning the tools and 
methods of digital management of building assets. 


When we speak of the Internet of Things, we are referring to any type of smart object capable of connecting to the 
Internet via a wired or wireless connection (through a dual communication capacity: M2M, machine-to-machine, 
and M2H, machine-to-human), and consequently having an active role within the communication processes. In 
order to be able to define a smart object, however, we need certain fundamental characteristics: the sensing 
component, i.e., the ability to gather information from the real world or to perform an action following an input, a 
unique identifier to identify the source from which the data is received, a connection to the Internet, for 
communication and notification of the information, and finally, one or more software platforms for the analysis 
and processing of the data collected 


The term IoT, in its simplest form, can thus be considered as the intersection between the internet infrastructure, 
objects and data. Other more complex definitions, however, lead to the inclusion of standards and processes, as 
these technologies make it possible to connect objects to the internet in order to exchange data according to 
industry standards that guarantee interoperability and enable the execution of mostly automated processes. We 
thus find ourselves having to manage not only an exchange of information between individuals but also between 
single devices. The potential applications of this technology are many, from smart home to smart cities, smart 
healthcare, smart retail or even connected cars. 


The main question to be asked is therefore how it is possible to achieve this type of interaction between very often 
heterogeneous systems and tools and to provide a simple and immediate service for the end user, both in the 
working environment and in everyday life actions. To address these problems, communication standards were 
created for specific areas, corresponding to the heterogeneous application areas of the IoT. We may, for example, 
have the need to have efficient communication with minimum packet loss but at the same time have a latency that 
allows communication to be defined almost in real time, or we may have the need for a less reliable protocol but 
with the peculiarity of operating on low performance and low power hardware. Each protocol offers certain 
functionalities or combinations of functionalities that make it preferable to others. Recurring factors that determine 
the preference of one protocol over another include geographical location, power consumption, physical barriers 
and hardware cost. 


The inapplicability of classical communication protocols even in the context of IoT devices depends on the 
minimal requirements in terms of hardware and power consumption required by these devices. It has therefore 
become necessary to develop new technologies that do not require complex computational efforts for sometimes 
very simple devices. The fundamental objective of an IoT architecture is to connect the physical world with the 
digital one, and over the years many entities, both international organizations and individual developers, have 
implemented new communication mechanisms, leading us today to have a wide choice of protocols at our disposal. 
Protocols that, however, were not designed to interact with each other, as they are based on different concepts and 
ideas. International organizations therefore, in order to prevent the vertical fragmentation of different commercial 
solutions, have set themselves the goal of defining open communication standards and mapping the traditional IP- 
based stack to the new IoT concept, understood as a network of heterogeneous devices connected to each other. 
Cisco, IBM and Intel have proposed an IoT reference model (fig.2) to standardize the concepts and terminology 
used in the IoT world based on seven levels. These levels are represented by: 
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1. physical devices and controllers: these are terminals that send or receive information; 

2. connectivity and communication between objects and networks: information must flow both horizontally 
between objects within the network, and vertically between different networks; gateways may be introduced for 
older devices not equipped with IP; 

3. edge/fog computing: is the data processing layer closest to the network with minimum latency from the data 
collection point; 

4. data accumulation: is level of data collection and storage; converts data-in-motion to data-at-rest; 

5. data abstraction: level of data aggregation from multiple devices and simplify data access to the application by 
creating schemas and data views; 

6. applications: provide the desired output through the interpretation of available information; 

7. collaboration and processes: level of involving people and business processes to make IoT application useful. 


Collaboration & Processes 
(lavoiving People & Business Processes) 


Application 
(Reporting. Analytics, Control) 


Data Abstraction 
(Aggregation & Access) 


0O00 


Data Accumulation 
(Storage) 


Edge Computing 
(Data Element Anatysis & Transformation) 


Connectivity 
(Communication & Processing Units) 


Physical Devices & Controllers 
(The “Things” in IoT) 


000 


Fig. 2: IoT Reference Model presented at IoT World Forum by Cisco, IBM and Intel 


This reference model follows the subdivision already used for the Internet, namely the OSI (Open Systems 
Interconnection) model, which consists of seven layers grouped into three media layers (physical layer, link layer 
and network layer) and four host layers (transport layer, session layer, presentation layer and application layer). 
Technologies such as Bluetooth and Wi-Fi use the lower communication layers while DDS or MQTT use, for 
instance, the application layer. 


2.1.1 IoT Devices 


The first elements that make up an IoT network are the devices themselves, which can be grouped according to 
their characteristics. According to the ITU, hardware platforms can be classified according to their computational 
and connection capabilities into three classes (Bormann et al., 2014): 

a) class 0, devices very limited in memory and information processing capabilities that sometimes do not even 
have the necessary resources to communicate directly with the internet in a secure manner and therefore rely on 
other devices such as proxies, gateways or servers; 

b) class 1, devices with certain limitations and that cannot talk to other nodes on the internet that use “a full stack 
protocol as HTTP, TLS and related security protocols and XML-based data representation”. However, they are 
able to use a specific protocol stack for limited nodes such as COAP and participate in the conversation without 
the use of gateways; 

c) class 2, less limited devices capable of supporting the same protocol stack as servers or notebooks. They can 
still benefit from the use of lighter and less energy-consuming protocols. Devices such as Arduino and Raspberry 
microprocessors fall into this class. 


From this classification, it can be seen that not all devices are able to connect to the Internet or process the collected 
data in situ. To facilitate the transit of information to the end platforms and to reduce the computational load, 
gateways are introduced, i.e. elements used, not necessarily, to establish communication from one device to another 
or to connect IP-based devices, which are not able to connect directly to the cloud environment. The data collected 
by IoT devices is in this case transmitted to a gateway, processed in the perimeter devices and then transmitted to 
the cloud. The use of gateways reduces latency and transmission size and also offers a higher level of protection 
to data-in-motion. When data is processed locally by the same device that collects it, it is called edge computing; 
when it is sent to a gateway for peripheral processing, it is called fog computing; and when it is sent and processed 
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within a cloud-based repository, it is called cloud computing. IoT applications can then be integrated with a data 
analytics engine for analyzing and customizing the output. 


2.1.2 Protocols and connectivity 


The concretization of the IoT concept has been made possible by the introduction of communication protocols, the 
most significant of which are: WSN (Wireless Sensor Networks), used in particular for sensing operations 
(environmental sensors); RFID, a system based on the use of radio-frequency waves, consisting of tags, readers 
and a back-end system that allows each ID to be associated with the corresponding physical object and any 
information relating to it; the NFC protocol, produced by Philips, Sony and Nokia, used to transfer data from one 
device to another over short distances. 


From these communication protocols, all the new standards we use today have developed, which we can group 
into two main classes, one short-range and one long-range. The short-range, low-power category is usually used 
for smaller environments such as homes or offices, which can be defined as Personal Area Networks (PAN). It 
includes technologies such as Bluetooth, NFC, Wi-Fi, Z-Wave and ZigBee. The long-range network category, on 
the other hand, allows communications up to 500 m with a minimum amount of energy. Within this category we 
find technologies such as LPWAN, from which proprietary solutions LoRaWAN and SigFox derive, as well as 
cellular IoT technologies such as NO-IoT, LTE-M and EC-GSM-IoT proposed by 3GPP. 


Within a specific network, devices communicate according to a particular type of protocol, i.e. a set of rules that 
work on different layers of their reference model and according to which data is transmitted and received along 
Internet backbones. The IoT is thus to be understood as a network that lives in parallel to the traditional protocols 
used, for instance, for the web (fig. 3) 


Internet IoT 
Stack Stack 


Fig. 3: IoT and Internet architecture 
At the application level, that is, at the interface between user and device, we find the following protocols: 


- CoAP (Constrained Application Protocol) (Shelby et al., 2014), a bandwidth and network constrained protocol 
designed to bring web functionality to devices with limited capacity. It uses a binary format rather than a text 
format like HTTP, for whose integration it requires the use of an intermediary (proxy); 

- MQTT (Message Queue Telemetry Transport) (Banks et al., 2019), a messaging protocol designed for lightweight 
computer-to-computer communications, used primarily for low-bandwidth connections to remote locations. It uses 
an author-subscriber crtiteria and is ideal for small devices that require bandwidth efficiency and battery usage; 

- AMQP (Advanced Message Queuing Protocol) (Godfrey et al., 2012), a specification for interoperable messaging 
for message-oriented middleware (MOM), creates interoperability between messaging middleware. It allows a 
wide range of systems and applications to interact, creating an asynchronous messaging system complementary to 
the http protocol. 
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The transport layer enables and protects the communication and transmission of data between different layers; 
within it we find: 


- TCP (Transmission Control Protocol) (Eddy, 2022), the dominant protocol for most Internet connectivity. It 
provides host-to-host communications by splitting large data sets into individual packets and resending and 
reassembling packets as needed; 

- UDP (User Datagram Protocol), a communication protocol that enables process-to-process communication and 
runs over IP. UDP improves data transfer rates over TCP and is optimal for applications that need lossless 
transmissions of information. 


The network layer helps individual devices communicate with the router; in this layer we find: 


- IP (Internet Protocol), many IoT protocols use IPv4, while newer implementations use IPv6. This recent IP update 
routes traffic over the Internet and identifies and locates devices on the network. 

- 6LoWPAN (IPv6 over Low Power Wireless Personal Area Networks) (Kushalnagar et al., 2007), is an IoT 
protocol conforming to the IEEE 802.15.4 specification that works optimally with low-power devices with limited 
processing capabilities. It allows the creation of wireless networks with devices that use the IP protocol for 
communication, through an intermediate layer placed between the MAC and network layers. 

- BACnet (Building Automation and Control Network), is a protocol developed by ASHRAE, and reported within 
ISO 16484-5, for managing building automation systems. Its goal is to create application protocols adaptable to 
all building control needs and transportable from one of the existing physical network technologies. 


The data layer (MAC) is the part that transfers data within the system architecture, identifying and correcting 
errors found in the physical layer; within it we find: 


- IEEE 802.15.4, an IEEE standard based on radio waves for a low-power wireless connection. It is used with 
Zigbee, 6LoWPAN and other standards to create embedded wireless networks; 

- LPWAN, networks allow communication between distances of 500 meters to more than 10 km in some locations. 
They arise to meet the special needs of the many applications that require wider coverage but do not need high bit- 
rates. The LoRaWAN network, developed by the LoRa Alliance, is an example of an LPWAN network optimized 
for low power consumption. 


Table 1: Most common IoT protocols 


Protocol Standard Frequency Distance 
NFC ISO/IEC 18092, ISO/IEC 21481, ISO/IEC 28361 13.56 MHz Max 10 cm (with other frequency you 
(universal frequency) can obtain different distances) 
Wi-Fi 802.11n (2009) — 802.1 lac (2014) 2.4 GHz — 5GHz 50 m (indoor) — 100 m (outdoor) 
BLE Bluetooth v.5 (based on IEEE 802.15.1) 2.4 GHz 50m 
ZigBee ZigBee 3.0 (based on IEEE 802.15.4) 2.4 GHz 10—100 m 
Z-Wave Z-Wave Alliance (proprietary technology) 800 — 900 MHz 10 m (indoor) — 100 m (outdoor) 
6LoWPAN Based on IEEE 802.15.4 Multiple physic support 20m 
LoRa LoRaWAN (ITU-T Y.4480) ISM 868/(915) MHz 10 km 


The physical layer is the communication channel among devices in a specific environment; part of this layer are: 
- Bluetooth, developed by Ericsson in 1994 and defined by the IEEE 802.15.1 standard, is an alternative to wireless 
information exchange using radio waves. It is optimal for high-speed data transfer up to 10 m. BLE (Bluetooth 
Low Energy) is a newer implementation that significantly reduces power consumption and cost while maintaining 
a connectivity range similar to that of classic Bluetooth; 

- Ethernet, a wired connection that provides a fast data connection with low latency; 

- LTE (Long-Term Evolution), a wireless broadband communication standard for mobile devices and data 
terminals. The LTE standard increases capacity and speed of wireless networks and supports multicast and 
broadcast streams; 

- NFC (Near Field Communication), a set of communication protocols using electromagnetic fields that allows 
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two devices to communicate at a maximum distance of 4 cm. When the two devices are brought close together, a 
peer-to-peer network is created that allows both to exchange information. They are typically used for contactless 
payments for mobile devices, ticket creation and smart cards; 

- PLC (Power Line Communication), communication technology that allows data to be sent and received over 
existing power cables. It allows an IoT device to be powered and controlled over the same cable; 

- RFID (Radio Frequency IDentification), uses electromagnetic fields to track otherwise unpowered electronic 
tags. Compatible hardware provides power and communicates with those tags, reading their respective information 
for identification and authentication; 

- Wi-Fi, standard 802.11, is a standard in homes and offices. It does not always fit all scenarios because of limited 
range and 24/7 power consumption; 

- Z-Wave, mesh network that uses low-energy radio waves for appliance-to-appliance communication. Being a 
proprietary technology, it is based on a different architecture on all layers; 

- Zigbee, developed by the ZigBee Alliance, now Connectivity Standards Alliance, IEEE 802.15.4-based 
specification for a suite of high-level communication protocols used to create personal local area networks with 
small, low-power digital radios. It is typically used in the context of the smart home where we find battery-powered 
devices. The total uptime is limited and most of the time the device is in a power-saving state (sleep mode); 

- Thread, developed by Nest and other companies, is based on the 6LoWPAN protocol for a mesh connection; 

- Matter, an open standard for the Smart Home developed by the Connectivity Standard Alliance with the aim of 
improving the compatibility and security of IoT devices. 


2.2 BIM and IoT 


In recent years, the massive development of digital technologies for acquiring data from variously deployed 
sensors for a multiplicity of uses and purposes, both within buildings and in the urban environment, has 
necessitated an expansion ofthe traditional semantic domains of the construction industry with particular reference 
to BIM-based information management processes (He et al., 2021; Tomalini, 2022). Created initially as a method 
for exchanging data between different silos, BIM today is increasingly being approached with concepts such as 
big data, IoT, and AI, which are seen as potential solutions for automation and inclusion of broader environmental 
contexts. The evolution of interoperability solutions, from ISO STEP to IFC to IFCOWL, is leading to the 
transformation of a static BIM to a new web-based paradigm. 


In fact, the information model must be able to accommodate and store not only data-at-rest, produced in the survey 
and/or design phases, but also to manage data-in-motion, coming in real-time from devices for monitoring the 
environmental quality of architectural and urban spaces. However, the inclusion of IoT sensors within an asset 
should only be considered as a starting point for the implementation of a Digital Twin (Sacks et al., 2020). Indeed, 
there is no single DT solution, as this may vary from time to time based on specific needs, just as its implementation 
is subject to evolve over time. 


Thus, the use of BIM methodology, and its extension to the Digital Twin concept, can bring enormous benefits for 
those involved in FM as it allows: planning of asset management systems and effective cost estimation; prediction 
of operational problems and improvement of maintenance activities; increased accessibility and security of 
information; reduced waste; and more reliable documentation (Singh et al., 2021). On the other hand, for IoT 
devices, many communication protocols and semantic models are available to support information exchange 
between devices but there is only partial integration with the IFC schema for the building. 


For example, some studies have focused on integrating open protocols such as BACnet and the IFC schema, 
creating specific MVDs to represent BAS information within BIM models at different stages of the building 
process (Tang et al., 2020). Even the use of extensions to the IFC schema, however, is still an immature process 
and provides only limited support for the integration of this information (Wang et al., 2022). In addition, the IFC 
format is designed for transferring data from one tool to another and is therefore not meant to be dynamically 
modified or transformed. The definition of Linked Data (LD) and the Web Ontology Language (OWL) has recently 
tried to address these issues. 


Given the limitations of the IFC schema some organizations have begun to develop alternative or complementary 
data schemes such as the Brick Schema, which provides a semantic description of the physical, logical, and virtual 
assets within a building and their relationships. This schema is defined using the Resource Description Framework 
(RDF) and is thus integrable with Semantic Web standards and Linked Open Data. Within it, classes such as sensor 
or plant-type equipment find broader description, and since version 1.3 the schema also includes support for linking 
Brick models and the sensor network with communication protocols such as BACnet, facilitating the achievement 
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of a digital twin. An IFC model can contain within it a link to the Brick model via a unique identifier contained 
within the [fcLibraryReference instance, so that an external platform can retrieve information from both schemas. 


3. A FRAMEWORK FOR APPLYING DIGITAL TWIN TO BUILT ASSETS 


A processing framework is defined for the collection and management of data-in-motion from IoT and subsequent 
integration with data-at-rest allocated in a BIM-based common data environment, aimed at the creation of a Digital 
Twin of existing real estate assets, oriented to subsequent developments of big data analytics through AI 
applications. It is thus intended to support the decision-making processes of the different operators, owners, facility 
managers, technicians and experts involved in the operational phases of buildings and to be able to plan planned 
and/or corrective maintenance actions, generate content, recommendations, best practices and formulate forecasts 
on managed assets. Particular insight will then be conducted for the optimization of integration processes between 
BIM and IoT in relation to interoperability and data-set exchange issues in the creation of DTs, and to enable real- 
time visualization of monitoring data from BIM models for Facility Management. 


The proposed approach involves the implementation of BIM models from the data and information held by the 
owner, which manages the real estate assets, whether of geometric (2D/3D), alphanumeric or documentary type, 
enriched with the additional semantic content useful for the subsequent management phases. Upstream of this 
operational phase, however, an in-depth analysis must be developed of the activities carried out within the same 
estate property concerning the maintenance and functionality of the assets in terms of actions carried out, resources 
and available infrastructure, aiming to define the organization's information requirements (OIR), to which all the 
information exchanges that will govern the various management processes must be informed. This should converge 
in the compilation of a BIM Guide for the implementation of the organization's asset information models, which 
can standardize their delivery processes among the various internal operators, or external suppliers, following 
specific standards and best practices adopted by the company. 


Te anom ot To nsnm 
nee NE Dacron ae 


Ea koon 
| sen “@ nyo 
© aop 
‘© mon z 4 = = 
4 = m 
eem 0. awin TT” 
= n en 
. EPa | peoa panem 
LESS Coston} 
aT [fee aterm 
© aoaie *@ novation © =e 
ely - — 
"oitiva 
Ta nsurge j 
L ‘e nostr | 
ro oye 


Fig. 4: Derivation path of IfcObject and IfcTypeObject inside IFC schema 


The addressee of this research is a public institution and consequently it is necessary to think not only in terms of 
proprietary formats but how the various building elements and their information should be modelled and delivered 
to the maintainer. After compiling the OIRs the next step is therefore to draw up matrices based on the Level of 
Information Need concept that allow us to catalogue all the assets within individual buildings so that we have a 
complete technical record of the minimum information to populate our models. This also allows us to begin to 
create a library of general BIM objects that can be used in any building, and associate to each of these the relative 
class of the IFC schema with the appropriate psets necessary to convey the required information. A first reasoning 
must be made on the naming convention to be adopted for the creation and use of such objects, since if we look at 
the proprietary Revit software, for example, we find the distinction between family name and type name, while 
within an IFC model we can observe many instances of the same object (/fcObject), each with its own name, 
typically distinguished by a progressive number, and different types (/fcTypeObject) which instead perform the 
function of template for the assignment of homogeneous parameters for that grouping of objects (fig. 4). Types 
and instances in this case are both derived from the superclass /fcObjectDefinition and therefore have a horizontal 
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relationship with cardinality [1:N] and not a vertical dependency as proposed by proprietary software. This entails 
the assignment of different names as the former must be understood in a generic context regardless of the project 
([fcProject) in which they are placed, while the latter have their own identity when placed within a floor 
([fcBuildingStorey), or a room ([fcSpace), and consequently are assumed to require, in a management type 
environment, a code capable of representing this position and not being generic for the whole building. The second 
reasoning must then be done on the parameters relating to these entities. Finding a correspondence between the 
parameters already in use by an organisation and those prepared by the IFC schema is not always trivial or easily 
represented by the tools in use today. Where it is not possible to find this relationship, in fact, custom psets created 
within the BIM authoring software will be used to meet the needs of the client, knowing that they may be subject 
to change over the years as a result of future changes to the schema proposed by buildingSMART. 


In parallel, a data environment will be set up to collect, index and process data from a number of IoT devices 
deployed within the assets (Fig. 5). These devices are to be chosen based on the observations presented in the 
previous chapters and will need to enable real-time collection of environmental, or other designated, information. 
There will then be a breakdown by layers, where the physical layer will be represented by the device itself; the 
latter will send the collected data to one or more gateways, or directly to the cloud, for indexing and aggregation 
of the information (Data Storage Layer). Then the data, depending on the type, can be integrated within appropriate 
databases (Data Integration Layer) and retrieved through appropriate processes or APIs within software 
applications for real-time analysis and querying. 


Gateways Intemet Data center 
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Networks 


Internet Stack 


Visualization APIs 


Fig. 5: IoT framework 


Aware of the impossibility of directly integrating dynamic information within a BIM model, and in particular 
within the IFC schema, it is necessary to adopt an intermediate platform that acts as a collection and processing 
point for all the data collected. The workflow envisages the inclusion within the BIM models of digital 
representations of the sensors (JfcSensors) that will be inserted into the real environment with a series of parameters 
describing their characteristics and the use of a unique identifier (/fc7ag) that can be used as a key for association 
with external DBs. This reference must necessarily be the same as the one present in the DB created from the data 
collected by the various sensors. The use of a cloud-based platform finally allows us to combine these two 
databases, representative of two different semantic domains, for the creation of customised queries and dashboards 
useful for the maintenance of an entire portfolio of assets. 


3.1 The case studies 


A PNR EU Next Generation research project, entitled “B/M-to-Digital Twin: information management to support 
decision-making in the building life cycle”, has been initiated as part of a collaboration with the University of 
Florence's Building Area, with the aim of developing information management of built assets belonging to the 
university's real estate stock through the implementation of BIM information models aimed at facility management. 


The Building Area is divided into three Process Units - Real Estate, Building Plan, Ordinary Maintenance - in 
addition to Administrative Support, to which are added two specialized services called “Fire System Management 
(GSA)” and “Control and Maintenance of Asbestos Containing Materials”. In particular, the tasks of the Ordinary 
Maintenance PU are those of planning and scheduling of ordinary maintenance interventions, coordination of 
technical referents allocated in the various territorial offices, monitoring the need for programmable maintenance 
interventions and requests for urgent interventions, and coordination with the Property and Logistics Services Area 
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in cases of integrated interventions. 


Facility management operational services managed by Ordinary Maintenance UP are divided into: 1) Maintenance 
services, 2) Cleaning and environmental hygiene services, and 3) Reception and porterage services. Each 
operational service includes various activities that are divided into: ordinary activities (predefined or 
supplementary) or extraordinary activities (breakdown or on-demand). Maintenance Services include all activities 
aimed at maintaining the functional state and preservation of the building's systems and construction components. 
Specifically, the categories of systems managed as different maintenance services are the following: Electrical 
system, Water system, Heating system, Air-conditioning system. Elevator system, Firefighting system, Security 
and access control system, Networks, Minute building maintenance. 


Fig. 6: Layout from BIM model of an asset of University of Florence 


The managed real estate portfolio is characterized by numerous assets, different both in terms of function, 
construction, system, etc., and historical-architectural value, which require different intervention methodologies 
and maintenance systems from case to case. Precisely with regard to management and maintenance processes, the 
University has been equipped for some years now with a dedicated IT tool for the management of its real estate 
assets, namely Infocad.FM by Descor s.r.l., a corporate partner in this research. The backbone of this tool is the 
centralized and standardized Technical Registry within which all the information (documents, data sheets, CAD 
plans, photos, etc.) used, by those who interact in various capacities with the properties, converge. 


The buildings in the archives are subdivided by municipality and geographical area. Each building is also currently 
accompanied by all the patrimonial and urban planning documentation that distinguishes it (cadastral extracts, 
property titles, lease or loan contracts, etc.). Despite this rationalization effort there are still many difficulties in 
the information management of these assets mainly due to the heterogeneity of the available data and information, 
since in many cases these documents are still in paper format. Therefore, there is a need to move to an information 
management using BIM tools and methodologies, so that data and information about the building are produced, 
managed, stored and exchanged in a secure, reliable and consistent way within a CDE, which allows not only the 
uploading of files but also the writing of metadata related to them (Paparella & Zanchetta, 2020). 


The launch of the research project saw the selection of pilot cases and the implementation of a number of built 
asset information models, which will serve as the test bed for the testing of IoT models and the subsequent 
development of Business Intelligence (BI) to support facility management activities. Specifically, the asset model 
of each building is organized into federated models related to the different disciplinary domains-architectural, 
structural, and systems to facilitate the management of the model's information content in the subsequent stages 
of data extraction for technical performance simulation (fig. 6). The various modeling stages are developed 
depending on the type of geospatial data available through CAD-to-BIM and/or Scan-to-BIM processes and 
through the use of Autodesk Revit BIM authoring software, which will be followed by export to IFC format, 
mapping all elements to the correct schema class. 


A phase of analysis of the history of the maintenance actions carried out by the Ordinary Maintenance UP and the 
related costs incurred for each building investigated is also underway, in order to define strategic lines of 
improvement in the FM of the managed assets to be conducted through the preparation of an IoT model for 
environmental monitoring, energy consumption and in general the efficiency of the facilities. In particular, this 
will make it possible to identify those spaces within the buildings that, in relation to their conditions of use, can 
be a source of useful data to be acquired by sensors (temperature, humidity, pressure, air quality, movement, 
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brightness, etc.). 


4. CONCLUSIONS AND FUTURE DEVELOPMENTS 


The use of BIM tools and methodologies in the information management of the operational phases of a built asset 
with its extension to the Digital Twin concept can bring enormous advantages from an economic and management 
perspective for the owner entity and more specifically for facility managers. 


The "BIM-to-Digital Twin" research project, to date in its start-up phase, aims to implement a Decision Support 
System (DSS) within Facility Management activities through the integration of BIM platforms and IoT technology, 
geared toward subsequent developments of big data analytics and AI applications. The system should 
accommodate different types of structured and unstructured data from different sources and enable integration with 
the IoT model identified for the specific experimentation conducted in the various case studies identified. 


This contribution aimed to develop a broad survey of the various approaches available and the problem still open 
for defining an information technology architecture, which supports the implementation of IoT sensors that can be 
integrated with the BIM information models of built assets for more efficient management of maintenance and 
operation activities implemented by owners. Despite the wide variety of communication protocols and semantic 
models for information exchange between IoT devices there is still only partial integration with the IFC scheme 
for building classification. Extensions of the IFC schema still provide immature and limited support processes in 
integrating this type of data. In fact, the IFC format was not designed for data-in-motion transfer from one tool to 
another and thus to be dynamically modified. The definition of Linked Data (LD) and the Web Ontology Language 
(OWL) has recently tried to address these issues. 


Promising developments can be recognized in the Brick Schema where within it classes such as sensor or plant- 
type equipment find wide description, and connections between Brick models and the sensor network with specific 
communication protocols are also included, facilitating the implementation of Digital Twins. 
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COMBINING LARGE-SCALE 3D METROLOGY AND MIXED 
REALITY FOR ASSEMBLY QUALITY CONTROL IN MODULAR 
CONSTRUCTION 


Wafa Bounaouara, Louis Rivest & Antoine Tahan 
Ecole de Technologie Supérieure (ETS), Montréal, Quebec, Canada 


ABSTRACT: The quality control (QC) of assembled modules is an essential process when constructing modular 
buildings such as hotels and hospitals. Defects that go undetected during module assembly may result in lost 
productivity in the form of unnecessary transportation, rework or project delays. OC has traditionally been 
performed using specialized tools and carried out a posteriori in an inspection station dedicated solely to this task. 
Nowadays, large-scale 3D metrology technology provides a more efficient alternative since it enables accurate 
measurements to be taken in situ. Additionally, mixed reality (MR) supports the immersive projection of 
information and guidance instructions. This paper introduces a proof of concept of a framework that combines 
industrial photogrammetry with the HoloLens 2 MR headset to assist with assembly and QC during the off-site 
construction phase of modular construction. Many tests were conducted in a laboratory and a factory setting to 
evaluate the systems user-friendliness and possible challenges associated with its future implementation. The 
experiments conducted confirmed that combining 3D metrology with MR offers an interesting solution for 
integrating QC into the assembly process. However, further work is needed to enhance the measurement workflow 
and optimize the measurement system 5 accuracy. 


KEYWORDS: 3D Metrology, Augmented Reality, Mixed Reality, Modular Construction, Photogrammetry. 


1. INTRODUCTION 


In modular construction, volumetric building units, or modules, are factory-built with almost complete interiors 
including plumbing, electricity, insulation and even furniture. They are then transported to the construction site for 
final building assembly, which basically involves stacking the pre-assembled modules together. The modules’ 
structure can be made of wood, steel or a combination thereof depending on the customer’s requirements and the 
final height of the assembled building. 


During module assembly in the factory, the control of key characteristics (e.g., overall dimensions, squareness 
error, parallelism of the ceiling and floor) is essential to ensure the module complies with the specifications and 
estimate the adjustment shims needed when stacking modules on-site. Traditionally, this process has been carried 
out manually using tools like measuring tapes and levels in the assembly stage and controlled a posteriori at an 
inspection station dedicated solely to this task. If necessary, adjustments and corrections are then made, which 
leads to lost productivity in the form of unnecessary transportation, rework or project delays. 


The advent of contactless 3D metrology provides a valuable alternative since it makes it possible to take accurate 
measurements in situ, while building information modeling (BIM) provides a 3D digital representation of as- 
designed buildings. This combination of technologies thus automates inspection and digitalizes QC information. 
QC automation in the prefabricated construction industry has been addressed in recent research. Bae and Han 
(2021) proposed a vision-based approach to off-site quality inspection that reconstructs 3D point clouds using a 
projector camera system and computes how much scans deviate from the virtual model to generate quality 
assessment error maps. Kim et al. (2019) proposed to use a registration-free mirror-aided laser scanning approach 
to inspect the dimensions and geometric requirements of planar prefabricated elements. Xu, Kang, and Lu (2020) 
used laser scanning reconstruction technology to inspect surface defects in prefabricated concrete elements. The 
information they collected during QC was then stored in accordance with the Industry Foundation Classes (IFC) 
standard and integrated in a BIM platform. 


In industry, advances in 3D measurement systems have made it possible to incorporate inspection in the assembly 
process, which is referred to as measurement-assisted assembly (MAA). The term MAA is used to describe any 
process that involves measurements being used to guide assembly and QC (Muelaner, Kayani, Martin, & 
Maropoulos, 2011) 


MAA was first introduced as a paradigm shift for the assembly of high-quality large-scale complex structures like 
aircraft frames to eliminate the monolithic jigs and manual specialized tools that were usually involved in large 
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flexible-component assembly (Maropoulos, Muelaner, Summers, & Martin, 2014). This paradigm shift was 
motivated by advances in large-scale 3D metrology systems that made it possible to take measurements during 
fabrication or in situ. This is particularly beneficial because large-scale structures are often too big to fit into 
conventional measuring devices or be transported to calibration laboratories (Schmitt et al., 2016). While the 
construction industry requires less accuracy than the aerospace industry, the same concept can be used to integrate 
QC in the assembly process of modular and prefabricated structures. 


On the other hand, workers need to keep their hands free during assembly to move around and carry their tools. 
Augmented reality (AR) and mixed reality (MR) technology can make a significant contribution here. They merge 
computer-generated information with real-world sensations using a device (e.g., a head-mounted screen, a 
projector, or a tablet) that provides an immersive user experience and eliminates the need to constantly look at 
fixed screens for information. In the case of MR, the user can interact with the virtual objects (Peddie, 2017). MR 
can be seen as an evolution of AR that has been made possible by technological advances in sensors and imaging 
techniques (Park, Bokijonov, & Choi, 2021). 


Various studies have evaluated applying AR and MR to assembly tasks and inspection processes. Qin et al. (2021) 
investigated whether it was possible to use head-mounted AR displays for wood frame assembly tasks. Ahn, Han, 
and Al-Hussein (2019) proposed to use a projection-based AR system to provide workers with visual guidance 
during manual panel assembly. Their system projected as-designed models (panel drawings) into the assembly 
station. Kwiatek et al. (2019) demonstrated that using a mobile AR application in conjunction with 3D scanning 
during pipe section assembly and inspection improved productivity, reduced the amount of work that needed to be 
redone, and enhanced workers’ spatial skills. Talamas (2017) evaluated using MR interfaces to automate the 
metrology process flow for in-line assembly process inspection and found that each volunteer made fewer errors 
when using the MR interface than paper or laptop instruction guides. 


The aim of this paper is to propose a proof of concept of a framework that combines 3D measurement technology 
and MR to integrate QC in off-site module structure assembly. The purpose is to enhance productivity during the 
assembly process, provide more accurate measurements, and ensure quality output traceability. 


2. MATERIALS AND METHODS 
2.1 Measurement Equipment 


In this research, we focus on assembling a cuboid wood frame structure formed of a floor, four walls and a ceiling. 
We suppose that the quality of these six parts was previously controlled. To control the quality of the assembly 
process, we measure the 3D position of a set of points that will make it possible to control the key characteristics, 
(KCs) the overall dimensions, squareness error, parallelism of the ceiling and floor, etc. These positions will then 
be compared to the data represented in the computer-aided design (CAD) model. Fig. 1 illustrates the wood frame 
structure and the set of control points. 


jI ii WMA) ssn. 
ci EA r 
Ti || | 


Fig. 1: Wood frame structure 
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SECTION E - ADVANCED TECHNIQUES FOR THE CONSERVATION AND MANAGEMENT OF BUILT ASSETS 


Photogrammetry is a technology that is based on the principle of optical triangulation. An element is positioned in 
3D space by at least two cameras that are used to identify targets from different viewpoints. The targets reflect 
infrared light. The cameras can then capture the position of the targets and position them in relation to the 
measurement system reference. Multiple targets can be perceived simultaneously, and their positions can be 
determined dynamically for real-time target tracking (= 10 Hz). Photogrammetry is used in many sectors for 
dimensional inspection. In our research, we use its tracking capability to provide 3D measurement data to assist 
with the assembly process. C-Track is a photogrammetry device from the company Creaform® that has a 
measurement volume of up to 16 m°. Its measurement range can be extended by combining up to four devices to 
form a measuring system around assembly stations. This eliminates the need to transport measuring equipment 
from one station to another. Additionally, C-Track can be integrated with portable scanners to probe or scan specific 
geometric elements as required. 


However, the retroreflective targets that are commonly used with C-Track, such as stickers and magnetic artifacts, 
are unsuitable for wood framing and cannot be accurately calibrated in the CAD model. To address this limitation, 
customized artifacts have been developed. The artifacts are designed to be easily attached to a wood frame. Each 
artifact is composed of three retroreflective targets (C1, C2, C3) to locate a point in 3D space. The targets have 
different spacing to be able to easily differentiate them. Fig. 2 shows a sample artifact developed for tracking the 
upper corner of a wall. 


Fig. 2: Sample artifact 


2.2 Measurement System Setup 


Prior to initiating the inspection process, certain preliminary operations must be conducted to prepare the 
measurement system. These operations are: environmental referencing, alignment (or registration), and tracking 
model creation. 


Environmental referencing: This step serves to establish a frame of reference for C-Track within the real 
environment. The working environment is identified using retroreflective targets (the targets shown in blue in Fig. 
3 (a)). These targets are then registered using the primary C-Track (if multiple C-Tracks are used) and exported as 
a reference file. Once this has been done, measurements can be taken by moving C-Track within the referenced 
environment. 


Alignment: To be able to compare measured point coordinates with the CAD model, the C-Track needs to be 
aligned within the 3D CAD volume. This involves creating reference entities that align the instrument in the 
metrology software workspace with the real-world instrument. We utilized the floor as a centerpiece. 
Three perpendicular planes were probed and used to define the measurement coordinate system (as illustrated in 
Fig. 3 (b)), which was then aligned with the CAD model. Creaform’s HandyPROBE portable probe was used with 
C-Track for this alignment process. 


Tracking model creation: The tracking model refers to the collection of retroreflective targets that the C-Track 
system dynamically tracks. To create this model, the targets comprising the tracked artifacts are registered using 
the C-Track. Fig. 3 (c) illustrates this process. The acquired targets are then exported as a text file for future use 
during the inspection process. 
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(b) Aligning the measurement 
coordinate system with the CAD model 


(c) Creating the tracking model 


Fig. 3: Measurement system setup 


2.3 Data Processing 


The raw data collected by C-Track has to be processed and interpreted in order to extract KCs and interpret the 
measurements in line with the technical specifications. VxElements® software was used with C-Track for data 
acquisition. For data post-processing, inspection software offers a wider range of tools. In this project, 
PolyWorks Inspector® was used to extract dimensions from a CAD model, import measurement data and compare 
the measurement data with the CAD data. Moreover, macro scripting was used to automate the dynamic 
measurement process workflow. 


For each artifact (see Fig. 2), C-Track provides the (x, y, Z) positions of the three targets (C1, C2, C3). It is thus 
possible to create a local coordinate system (£) attached to the artifact. Once the geometry of the artifact is known, 
the position of a point of interest (for example, an upper corner of a wall) can be determined in the local coordinate 
system. Geometric transformation makes it possible to determine the point of interest’s position in the global 
coordinate system (the measurement coordinate system). This logic is computed via macro scripts in 
PolyWorks Inspector®. The detailed method is explained below. 


The first step is to identify the targets, which involves associating them with each point of interest and 
distinguishing each artifact’s different targets. To identify the targets that correspond to each artifact, we calculate 
the Euclidean distances between the nominal position of the corresponding point of interest and the targets. The 
three targets that lie within a sphere of radius 10 cm around a point of interest are considered the targets associated 
with the relevant artifact. If one or more targets are missing, an error message is displayed. This approach is based 
on the assumption that the part (the wall, for example) is positioned roughly at its nominal (CAD) position. Once 
the three targets associated with each artifact have been identified, the three targets are distinguished by comparing 
the distances between the targets based on the artifact’s geometry. Equation 2.1 provides the vector calculations 
for creating a local coordinate system £ (tp, Jy, ke) around each artifact using the positions of the three targets (C1, 
C2, C3). 


i= Cz — Cy 
ee, -G Il 


> C3 — Cy k, oS 21 
a Jp ; = ÚA ; 
Je ICG- G, |l r} ele 


The position of the point of interest in the global coordinate system g, P, = (xg, Yo Zg), is calculated by geometric 
transformation from the position of the point of interest in the local coordinate system £ (Pp = (xp, ye, ze)). This 
transformation is represented by Equation 2.2: 
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Pg = Tye Pe 


where T, is the transformation matrix from the local coordinate system (£) to the global coordinate system (g). 
To represent this transformation using homogeneous coordinates, Equation 2.2 is equivalently expressed as 
Equation 2.3: 


Xg M11 M12 M3 ty] ~% 
Yal _|%21 T22 123 ty! JM 23 
Zg T31 T32 «+133 tz| |Z 

1 0 0 0 1 1 


Equation 2.4 donates the rotation matrix: 
Ta M2 113 x 
T21 T22 123| =|yīg vig Ykg 2.4 


The translation vector corresponds to the coordinate of the origin of the local coordinate system in the global 
coordinate system as donated by Equation 3.1: 

ty xclg 

le] =4yclg 2.5 


2.4 User Interface 


As mentioned above, this project uses MR to project dynamic measurements for a user during wood frame structure 

assembly. The HoloLens 2 headset is a completely standalone head-mounted display (HMD), which means that it 

doesn’t need to be connected to a separate computing device. In addition, it is MR-based, which means that it 

enables users to interact in real time with digital content that is superimposed on the real world. The content takes 

the form of holograms, and the holograms interact simultaneously with the user and the real world. Furthermore, 

HoloLens 2 enables users to interact with holograms using voice commands, hand gestures or eye movement. 

What’s more, the technology makes it possible for two users in different locations to see what the other sees, which 
enables one to guide the other through a process or simply interact with the world the other sees. This feature has 
the potential to make remote collaboration easier, more efficient and far more interactive. In addition, an MR plug- 
in exists for PolyWorks Inspector that makes it possible to manipulate an inspection project directly on HoloLens 2. 
Macro scripting can also be used to customize the user interface displayed by HoloLens 2. 


The user interface must provide the operator with the values of the KCs measured to guide the adjustments to be 
made and ensure the geometric quality of the assembly. In addition, it must enable the operator to perform certain 
control commands, such as exporting results and navigating between different inspection stages. 


The user interface proposed has two components (Fig. 4): (1) the menu or toolbar, which is composed of 
three buttons that are each associated with a command Button 1 launches dynamic measurement, Button 2 exports 
the measurement results, and Button 3 shows or hides the CAD hologram; and (ii) annotations to display the 
measured value’s deviation from the nominal value along the x, y and z axes. The annotations are displayed at 
the nominal position of the corresponding point of interest and change color depending on whether the measured 
value is within or outside of the tolerance interval. Note that the functionalities of the PolyWorks AR plug-in for 
HoloLens 2 were used to align the CAD hologram with the real environment. 
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Data processing and real-time annotation updating 
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results 


Fig. 4: User interface created for the HoloLens 2 headset 


3. LABORATORY TESTING 


In order to conduct tests in a laboratory setting, a scaled-down module had to be constructed that adhered to spatial 
and ergonomic limitations. The scaled module replicates a typical wood frame structure and is around 1/8th the 
size of an actual module. The module was designed using CAD software. Dimensional and geometric data was 
present in the CAD model and served as reference information for the nominal (as-designed) model. The measured 
data was subsequently compared with this nominal data. 


3.1 Experimental Setup 


The floor was placed on a granite surface plate and secured to the table with a weight to prevent it from moving 
(see the picture on the right in Fig. 5). Next, we acquired the reference targets, performed alignment and acquired 
the tracking model as explained in § 2.2. 


In an industrial context, adjustable braces are used while assembling walls. A turnbuckle system was built to 
replicate this for the experiment and allow the operator to easily adjust the position of the wall while tracking the 
position of the point of interest (see the picture on the left in Fig. 5). 


Fig. 5: Experimental setup 
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3.2 Gage Repeatability and Reproducibility Study 


A measurement may be influenced by various sources of variation during the inspection process, which results in 
there being uncertainty associated with each measurement result. Measurement uncertainty is a quantitative 
assessment of the unreliability associated with a measurement result based on probability distributions. The 
dispersion of a set of measurements of a quantity can be characterized using the estimator of its standard deviation, 
which is also known as standard uncertainty (0). 


Studying the overall uncertainty of the measurement system and evaluating whether the system is able to accurately 
detect quality defects requires more in-depth study that is beyond the scope of this project. In this study, we intend 
only to evaluate the amount of variation in the measurement data that is attributable to the measurement system in 
the configuration proposed. Measurement system variation consists of two important factors, repeatability and 
reproducibility (R&R). Repeatability is related to equipment variation, whereas reproducibility is related to 
inspector or operator variation. Measurement system variation can be assessed by conducting a Gage R&R study, 
which involves data being collected by having multiple operators measure the same set of parts in a random order. 
Several methodologies can be used for statistical analysis of the data obtained from the Gage R&R study. We chose 
to use the ANOVA method, which breaks down the sources of measurement system variation as follows:(1) part- 
to-part: variation originating from the parts being studied; (2) reproducibility: variation originating from the 
operator(s); (3) operator/part: variation arising from the operator(s) interacting with the parts; and (4) repeatability: 
variation that originates from the measuring system and cannot be attributable to other sources of variation. The 
purpose of measuring multiple parts is to evaluate manufacturing method variation, which is also beyond the scope 
of our research. Therefore, only one part is measured in this study and, thus, variation sources (1) and (3) listed 
above (which are usually provided by ANOVA) are not taken into consideration in our analysis results. 


3.3 Test Sequence 


Two operators were asked to repeat the positioning of a wall while tracking the position of the wall’s two upper 
corners. An operator begins by wearing the MR headset and opening the project. The interface displays a hologram 
of the CAD file with the wall in its nominal position along with a three-button menu, the latter of which is described 
in § 2.4. The operator then positions the wall approximately in its nominal position and attaches it to the floor and 
the adjustable assembly brace system. They then fasten the artifacts to the top corners of the wall. Afterwards, they 
start dynamic measurement by pressing Button 1. The annotations are displayed to indicate each point of interest’s 
positioning error on the x, y and Z axes. 


The operator begins by adjusting the wall’s position along the x-axis in accordance with the deviation values 
displayed in the annotations for Points | and 2 (see Fig. 5). Once the x coordinate is within tolerance, the operator 
adjusts the position of Point | along the y and z axes by adjusting the adjustable brace. When Point 1’s position 
is within tolerance along all three axes, the annotation changes from red to green. The operator then moves on to 
Point 2 and adjusts its position along the y and z axes. Once both annotations are green (see Fig. 5), the operator 
stops dynamic measurement by pressing Button 2, which automatically exports the values to a text file. Each 
operator repeated the sequence 37 times, with six measurements captured each time: Pointl _x, Pointl_y, Pointl _z, 
Point2_x, Point 2_y and Point2 _z. 


3.4 Results and Discussion 


The data collected was analyzed using the ANOVA method. Table 1 indicates the repeatability and reproducibility 
standard deviation of x, y and z for Points 1 and 2. The precision-to-tolerance ratio P/T is expressed by 
Equation 3.1: 


60 


= —— 3.1 
USL — LSL 


P/T 


where USL is the upper specification limit, LSL is the lower specification limit, and ø is the standard deviation 
of the measurement error. 
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Table 1: Gage R&R results 
Point 1 Point 2 


x y Z x y Z 

Orepeatability 0.335 0.487 0.152 0.325 0.413 0.089 

Oreproducibility 9.000 0.019 0.028 0.000 0.237 0.029 
ORaR 0.335 0.487 0.154 0.325 0.476 0.093 
P/T 33.5% 49% 15% 33% 48% 9% 


The results show that the maximum total standard deviation is 0.487 mm. This means that 95% of the time, the 
measurement value varies by no more than 407 ota) = 1.948 mm (+26rota = +0.974 mm), which is below the 
maximum allowed tolerance range of 6 mm (+3 mm) indicated in the technical specification. However, for a 
measurement system to be considered “good”, its precision-to-tolerance ratio must be <10%. (Note that 
10%<P/T<30% is borderline, and P/T>30% is unacceptable). 


It should be noted, however, that system variation is influenced by the fact that the operator is asked to position 
each point within an interval of +1 mm. This can be confirmed by the fact there is less variation along the axis-z 
since this value is the least impacted by the positioning interval. In fact, when the operator adjusts the braces, the 
wall’s position along the z-axis almost doesn’t change. In addition, given the experimental setup, the y-axis 
corresponds to C-Track’s depth axis. A study of the effect C-Track’s depth of field has on measurement system 
variation showed that the depth of field has a significant effect on system repeatability (Emond-Girard, 2022). In 
all cases, the system repeatability error is greater than the reproducibility error, which means that the greatest 
source of variation is the measurement system itself, not operator manipulation. 


4. FACTORY TESTING 


In order to assess the user-friendliness of the proposed system and to highlight potential constraints linked to its 
use in an industrial environment, a test was carried out under real working conditions. Below is a description of 
the experimental setup and test procedure used, as well as the findings and observations noted following the test. 


4.1 Experimental Setup and Test Sequence 


A near-real-size module was designed to perform factory testing. The artifacts were also adjusted to a true scale. 
All system setup steps, including referencing the environment, aligning the measurement system with the CAD 
model’s global coordinate system, creating the tracking model, and aligning the CAD hologram with the real 
environment were first completed by the research team. Then, we explained to the operators assigned to take part 
in the tests how the system worked. Three (3) operators then worked together to pre-assemble the module and add 
the adjustable wall braces. One operator attached the artifacts and put on the MR headset to begin positioning the 
wall beginning with Point 1. Since the annotation displayed on the MR headset was small, the operator had to 
climb a ladder to read the value. The other two operators adjusted the position of the wall following the instructions 
given by the operator wearing the MR headset. The process was repeated until the position of Point 1 was within 
the desired tolerance range. The process was then repeated for Point 2 following the same procedure. Fig. 6 shows 
the test sequence. 
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4.2 Observations and Discussion 


The preliminary actions involved in referencing the environment can be quite demanding in an industrial context. 
The referencing targets that C-Track requires to locate itself may move due to vibration and operator movement. 
In addition, during the assembly stage, the floor often moved (drilling of the floor to fix the adjustable wall braces, 
operator movement, etc.), which made the proposed registration method (alignment of measured data and CAD 
data) not well suited to the real context. The operator who used the HoloLens 2 headset reported that it was easy 
to use and comfortable to wear and work with. However, during the test, some drawbacks were noted with the user 
interface, such as the size of the annotations, which was deemed to be too small. Also, we had to explain to the 
operator the orientation of the axes and how to interpret the values displayed in the annotation because the 
coordinate system axes and displacement vectors were not indicated. 


5. CONCLUSION 


In this research, we were able to achieve a proof of concept of a system that combines industrial photogrammetry 
and MR to assist operators during the assembly of a wood frame structure. Points of interests were tracked and 
compared to the nominal data presented in the CAD model. The deviation between the nominal value and the 
measured value was projected on the real assembly through the MR headset. The proposed solution also supported 
the documenting of measurement results. 


Although the Gage R&R study results showed that the measurement system varied less than the allowed tolerance 
range indicated in the technical specification, its precision-to-tolerance ratio is still higher than what is 
recommended. The error associated with artifact fabrication can contribute significantly to the system’s variation; 
thus, further research should work on optimizing the artifact and using the artifact’s as-manufactured geometry 
when computing the measurement data to minimize the amount of variation generated by the artifacts. Further 
research must be done to evaluate the measurement system’s overall uncertainty and validate the system’s ability 
to detect nonconformities. 


The test completed in the industrial context helped to identify some drawbacks related to the proposed solution, 
and, thus, this research serves as a guideline for potential future implementation of a quality control system that 
combines 3D metrology and MR for integration in an assembly process. 


The proposed measurement setup was complex and time-consuming, and keeping the reference targets stable 
seemed to be a struggle during factory testing. In order to use photogrammetry in in situ measurement-assisted 
assembly, fixed reference entity has to be integrated in the plant layout. Other large-scale technologies like indoor 
global positioning system (iGPS) could be investigated as an alternative type of measurement equipment. Also, 
improvements could be made to the user interface to provide the operator with more visual guidance and a more 
ergonomic way to display the measured values could be sought. 
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BUILDING’S TWIN RECONSTRUCTION 


Cecilia Maria Roberta Luschi & Alessandra Vezzi 
University of Florence UNIFI, Department of Architecture DIDA, Florence 


ABSTRACT: The work shows a process that starts from the digitization of cultural heritage and through analysis 

arrives at the subsequent diachronic holographic representation. The object of the study was the creation of two 

holograms of historical buildings: the church of St. Maria in Sovana with an interesting subsoil and the ruin of 
church of St. Francesco, attributed to Sangallo il Giovane in Pitigliano. However, the theoretical setting of the 
research is placed at due distance from the twin term. It has implications and meaning that are not matched by 
producing a digital copy, even a very high resolution one. The use of different cognitive technologies and the 
assembly of the various inputs in a digital model can never be defined as a twin of the original. Bearing in mind 
what has just been specified, the work was organized according to different levels of acquisition of cognitive data. 

Obtained from the survey and from historical studies the shape of the original state, a historical narrative 
interweaving of digital models has been created. The hologram is a three-dimensional and dynamic representation, 

which in the case of the church of San Francesco shows the reconstruction of the ancient architectural complex, 

visualizing the evolution of the studies of historical documents, up to the vision of today's ruin. For the church of 
S. Maria, the underground area is reconstructed through the geoelectric analysis of the subsoil and of the urban 

composition. Use of high detection technologies (GPR and HVNSR). The holographic of the artifact, promotes 

scientific divulgation and its dissemination sharing experience, it is not realized through device indeed, but thanks 

a holographic display. 


KEYWORDS: Digital twining of cultural heritage, hologram technology, scientific divulgation, dissemination - 
Diocesan Museum Palazzo Orsini of Pitigliano (GR). 


1. INTRODUCTION 


The research activity concerning the case of the church of St. Maria in Sovana (GR) and the ruins of the church of 
St. Francesco in Pitigliano (GR) starts by questioning the real possibility of making a digital twin of any object or 
structure especially in relation to the concept of cultural heritage. Starting from the awareness that nothing can be 
perfectly reproduced, both in structure and texture, even more we find ourselves in the impossibility of being able 
to observe any artifact divorced from its concrete and real contextualization. 


The highest resolution survey and the most advanced technologies can never produce the real in the sense of the 
twin as we would like it to be. Much less can digital reproductions be correlated with the originals of which they 
are a partial copy. The theoretical setting of the research thus identifies in the term twin, incongruence between 
signified and signifier; twins in fact have a usual nature, it would instead be more appropriate to speak of digital 
copies to already have in mind the intrinsic limitation of the action. The misunderstanding, evidently intended, 
may not seem foundational, but we think at the level of scientific setting it is. 


To return to the question of architectural research having clarified the position, albeit in a consciously non- 
exhaustive way, the paper proposes a twin action of architectural study and composition. That is aimed at 
identifying the usual nature of the project idea starting from artifact, investigating it according to the compositional 
rules of architecture referring to different historical periods. Therefore, if the idea has a usual nature, we can 
research its premises according to the regulative logics of architectural design and here plausibly offer a digital 
twin of the project idea, which is then confronted with the nature of the artifact, or the cognitive process given by 
the physical survey and the historical documentary study. In this case then the result shifts to making the project 
visible from the object. 


The structures that are virtually duplicated remain of the usual nature at the ontological level of idea, and the effect 
of this operation is an architectural image and form that mediates the direct study work on the object and the design 
process that preceded it. On the one hand, we are in the sphere of the late Renaissance, and we are confronted with 
a pragmatic architect by profession such as Antonio da Sangallo il Giovane, who in the service of the Medici 
family, engaged in urbanistic actions of profound transformation in that of Pitigliano, a contested episcopal city 
and a border town right between Rome and Florence. This area that today is called a minor interior area is 
characterized by castles, fortified towns and monasteries or convents, and the town of Sovana connected today 
with Sorano is one of these centers that supplied travertine and valuable mineral resources to Rome. The operation 
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carried out in Santa Maria Sovana is trying to represent an evolutionary diachronicity of a site that is resemantized 
within the urban grid. 


2. THE DHOMUS PROJECT, MATERIALS AND METHODS 


In the framework of the research described above, the DHoMus project, conducted in collaboration between the 
Department of Architecture (disciplinary scientific sector ICAR/17) of the University of Florence, the Diocese of 
Pitigliano-Sovana-Orbetello and the Diocesan Museum Palazzo Orsini in Pitigliano, began in March 2020. Action 
aimed at safeguarding cultural heritage and in line with the idea of a diffuse museum (Aiello, 2020b). Two relevant 
historical emergencies such as the church of the convent of St. Francesco in Pitigliano and the church of St. Maria 
in Sovana refer to the first museum pole in Pitigliano (Stefanini, et al. 2021). 


2.1 The case study of St. Francesco church, Pitigliano 


The Convent of St. Francesco, located outside the urban center of Pitigliano, is in a state of ruins today (fig. 01). 
In fact, the building, which was built in the XVI century to a design by Antonio da Sangallo il Giovane, was soon 
abandoned in the early years of the XVIII century under the pressure of the Napoleonic suppressions, leading to a 
gradual process of decay. In the second half of the 1900s the Diocese implemented a parceling out of the convent 
complex and remained the owner of only the church. 


There are still many elements of interest of the ruin, in addition to its architectural definition, that lead us to focus 
attention on this building again. 


The first phase of the study was to, as is customary, prepare an integrated survey project, which in 2019 approached 
a three-dimensional modeling of the actual state with care taken to keep the textural and chromatic data as faithful 
as possible to the actual appearance of the ruin (fig. 02) (Lecci et al. 2021). At the same time, research was carried 
out at the Uffizi Gabinetto of Drawings and Prints, a valuable fund for those interested in architectural design and 
the studies made by Renaissance architects. 


Fortunately surviving the events of about half a millennium are two papers precisely concerning the San Francesco 
in Pitigliano, where the architect in the early period of his professional activity sketches two plans of the building. 
From some reconstructions of the activity and the placement of the drawings it is plausible to think that the church 
in 1522 was already definitely built. 


The drawing depicts, on the recto of the page, the plan layout of the convent complex of St. Francesco, consisting 
of two cloisters around which the buildings are attested (fig. 03). Accompanying the project sketch is a legend 
indicating the function of the various buildings. What remains to this day of the entire convent is solely the church 
part. Church designed with a single nave with leaning against it on the long side three polygonal chapels 
extroflexed with internal apse. Note also that the church was planned to have a vestibule with three entrances from 
which to enter, now lost. 


The study of Sangallo's drawing, given the differences in the church between the project and its present state, has 
guided the research toward a more thorough investigation of the project itself, taking an interest in how the building 
was conceived and how it should have looked in its original state (Aiello, 2020a). 


The drawing seems to plan an overall idea and that the realization therefore according to the document could have 
been synchronic. The survey also provided us with a reading of the parts attributable to the original layout and 
thus compatible with the plan found in the Uffizi Gabinetto of Drawings and Prints. 


The functional distribution of the architectural complex was identified directly from the legend of the original 
sketch. An initial graphic elaboration was carried out that could communicate this information more clearly and 
immediately, placing the 16" century design sketch at the center and explaining the dislocation of the rooms by 
highlighting them and their wording, as specified by the architect in the legend (fig. 04). 


More extensive archival research identified designs that could be compared with that of St. Francesco in terms of 
characteristics and style, attempting to delineate the possible elevation thought up by the architect. 
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Fig. 1: Photo of the church of the convent of San Francesco in Pitigliano, GR. 


Fig. 2: Plan of the church of the convent of San Francesco in Pitigliano. Graphic elaboration obtained from the 
survey and developed by the arch. Luca Pasqualotti in his Architecture degree’s Abitare il Paesaggio Storico 
(Pasqualotti, 2020). 
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Fig. 3: Photo of page n°811 A, drawing by Antonio da Sangallo il Giovane showing the plan of the convent of 
San Francesco in Pitigliano. Gabinetto of Drawings and Prints in Uffizi, 16" century. 
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On the survey and the iconographic document, the metric-proportional study of the entire plan layout was carried 
out, with the aim of verifying whether the project, even if in the form of a sketch, had been conceived according 
to proportional ratios and/or according to specific mensural canons. 


Such a possible positive finding would on the one hand have helped the reading of the architectural portion visible 
today and on the other hand would have added information about the figure of the architect himself, regarding his 
modus operandi as a architectural designer. The analysis was based on the planimetry, from which the design 
geometries were highlighted from the proportional diagrams of the two cloisters. These were then investigated for 
the existence of any measurement modules between them (fig. 05) (Pasqualotti, 2020). Analyzing the length and 
width ratios of the greater cloister revealed an internal scanning in squares of sides equal to the span of the 
intercolumn of the portico. This correspondence thus revealed the existence of a modularity that, aggregated in a 
ratio of 4:5, punctuates the entire composition of the cloister itself. The modular quantity derived from this ratio 
was extended to the entire plan development of the complex, bringing out the same correspondence between 
module and project, thus suggesting that the architect had a clear proportional geometric structure of reference (fig. 
06) (Zerbini, 2022). 


Fig. 4: On the left: Graphic reworking of the project sketch by Antonio da Sangallo il Giovane; on the right: 
visualization of the sketch showing the different functions of places. 


Fig. 5: Compositional-proportional studies performed on the project sketch of the plan of San Francesco’s 
convent in Pitigliano. 
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Fig. 6: Compositional-proportional study on the plan of the church through the modular grid obtained from the 
previous study. 
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The proportional geometric protocol then applied directly to the survey of the surviving structure revealed a direct 
match even in the elevations, which are still recognizable today. The building consistency, therefore, has been 
identified in its main characteristics of heights and distributions. 


Missing at this point a stylistic reference to associate with the compositional one to trace the design idea. In this 
regard, compositional constraints of the surveyed structure were sought, such as portals, still recognizable facade 
openings, and traces of the ante-facade vestibule. It was then observed that the portals of access to the vestibule 
and the presence of a central rose window, would have conditioned the elevation of the vestibule itself, which was 
meant to allow direct light intake. For these reasons, the vestibule designed by Sangallo was assumed to be a 
single-register element, over which the gable of the church rises. 


The result obtained was the subject of three-dimensional modeling, which from the plan design shows the 
construction of the building according to the proposed hypothesis and finally overlaps with the model inferred 
from the survey, of the church in its present state (fig. 07). 


Two models of a different nature are thus obtained, the first is produced from a survey of the actual state, while 
the second is produced from a design model, constrained planimetrically and determined in elevations by the 
proportions introduced by the plans of the Sangallo il Giovane, and its analogous projects. 


The final idea was to produce holographically, the superimposition of the two models and the representation of the 
document testifying to the design intention of the Sangallo il Giovane. 


Fig. 7: Three-dimensional project of the hypothetical reconstruction of the church of San Francesco’s convent 
according to the idea of Antonio da Sangallo il Giovane. And overlap of 3D model with the existing monument. 
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2.2 The case study of St. Maria church, Sovana 


The small town of Sovana is characterized by the central Pretorio square and the ruins of the church of the patron 
saint San Mamiliano. It, believed to be the first church-cathedral of Sovana, is placed directly on the remains of 
an Etruscan and then Roman building almost bordering the Pretorio square whose northern edge is defined by the 
side of the church of Santa Maria. 


It perhaps to be identified as that Santa Maria, which is consecrated by Bishop Ranieri of Tuscania in 1208, is 
certainly mentioned already in the will of Count Ildebrandino Aldobrandeschi, called il Rosso, of 1284, and 
recorded in the Decimari of 1296. In 1321-24, it was looted and possibly damaged by the Sienese in 1410 and by 
the people of Pitigliano in 1434. Around 1558, the construction, on the initiative of Grand Duke Cosimo I 
de'Medici, of the Loggia with archive building, later to become Palazzo Burbon del Monte, deprived it of its facade, 
limiting access to only the side portal, open to the square (fig. 08). 


The interior of the church is of the basilica type, with three naves divided by polygonal pillars supporting wide 
round arches. The nave is divided into three bays by Gothic transverse arches, on which the wooden trusses of the 
roof are set. In the center of the presbytery, raised on a few steps, is the famous ciborium (VIII-IX century), the 
only example of its type in Tuscany, referable to the pre-Romanesque period (fig. 09) (Rivetti, 2018). 


In the work being presented, it is interesting to consider the urbanistic placement of the church in relation to the 
whole plan the plant. The latter consists of the square and a series of water distribution infrastructures that affect 
the last terracing where the entire city is set. The church, therefore, is in line with the oldest route that leads directly 
to the site of the present cathedral. Along this route, in the proximity of the apse of St. Maria's itself is the Public 
Fountain, currently facing the square, and a wash house at a lower level than the Fountain, adjacent, however, to 
the extroversion of the church's apse. In further analyzing the archival documents in the Technical Report of the 
Superintendence for the consolidation works of the church in 1984 it appears that in the apse outside during the 
excavation an empty archway was found, probably a possible ancient water access route connected to the water 
system described above (fig. 10). 


The continuing problems inherent in rising damp that affected the interior of the church induced to investigate 
what kind of water structure was present of the area at the time before the church was built. In 2021, thanks to the 
INGV (National Institute of Geophysics and Volcanology) of the CNR in Florence, it was possible to resort to a 
geophysical exploration inside and outside the church by means of an electromagnetic radar technique (Ground 
Penetrating Radar, GPR) survey at 300 MHz and 800 MHz. This made it possible to investigate the stratigraphic 
conformation and the presence of any structures in the subsurface. The results obtained indicate that there is a 
depressed area under the church with a compluvial development at the bottom, probably backfilled with material 
similar (tuffaceous elements) to that of church construction. The compluvium appears to be directly in axis with 
the apse and its opening to the outside described above, still present today, and a collector placed in the direction 
of the present wash houses (fig. 11). 


The water line, therefore, running from the cathedral to the main square, suggested the possibility that it was a 
permanence of water system of the classical period, also given the baths of the villa under San Mamiliano. The 
geophysical survey, moreover, returned a different density of material between the area of the aisles and the central 
nave of the church opposite the apse, delineating a well-defined rectangular area that ends in conjunction with the 
two rows of pillars. Opposite the present chancel, a very large apsidal form can be discerned that opens and extends 
the entire length of the church. This materializes a kind of double-apsidal plan with a central basin and an inflow 
channel that were placed in axis with the entire external water system (fig. 12). 


Regarding the problem of continuous rising damp that still affects the structure today, it is shown that water 
continues to pass through by capillarity. It is noted how the level of humidity is distributed on the floor forming a 
hemicycle opposite to that of the apsidal basin of the church. These results gave the possibility to advance 
hypotheses about the previous use of the area by focusing on possible references to nymphaea of urban areas of 
Roman cities or classical in general, tried to connect the various outcrops of water adduction. 


To make visible all that the instrumentation identified and highlighted, the choice was made not to make a copy of 
the church itself as such, but to structure a possible evolution from the Classical hypogeum layout to a conversion 
of the structure into a religious site of the early medieval period. The choice of 3D reconstruction is emphasized 
by the choice of the use of holographic representation, which through moving scenes, comprehensively depicts the 
steps in the history of the city of Sovana (Stefanini, et al. 2022). 
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Fig. 9: Graphic elaboration obtained from the survey and developed by the arch. Domenico Rivetti in his 
Architecture degree’s J battistero di S. Maria nella profondità della sua storia, 2018. 


Fig. 10: Technical Report of the Superintendence, 1984. Detail of the empty archway found in the apse outside 
during the excavation. 
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Fig. 11: Geophysical survey, 2021. INGV of CNR of Florence. On the left: internal section of the church GPR 
300 MHz (z=190cm) - Line 8.5m. On the right: overlay of EMP 400 and GPR 800 MHz comparative analysis. 


Fig. 12: Reconstructive hypothesis of nymphaeum types present in the hypogeum of the church of St. Maria. 
Arch. Domenico Rivetti in his Architecture degree's, 2018. 
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3. RESULTS 


Operationally, the holographic representation, is actualized through the support of the holographic showcase, an 
instrument formed by a monitor that contains the images or video to be projected. They are reflected on the glass 
faces of a hollow pyramid trunk and recomposed in the center. The operation is based on the theoretical principle 
of projective geometry, whereby the image, or video, contained in the monitor is projected onto the transparent 
surfaces inclined at 45° of the prism, directly applying the principles of projectivity (homology) (Lecci, et al. 2019). 


Interestingly, at a strictly operational and technical stage, a properly theoretical principle is being materialized into 
a result provided by theory. The principles of projective geometry make it possible to faithfully recreate an image 
or video animation of digitally made objects that appear via optical effect, in a three-dimensional view at the center 
of the pyramid (Yamanouchi, et al. 2016). To create the holographic projection, it was necessary to make a video 
that would show all the stages of the research previously described and be able to communicate the story of the 
building clearly and directly. The work involved writing a storyboard of the research contents and making them 
the protagonists of the images to be produced, structuring a storytelling to be included in the holographic showcase. 
To do this, the 2D and 3D material obtained from the survey was used, with which a real narrative plot was 
developed for the creation of the video animation (Gabellone, 2014). 


The storyboard made it possible to sequence all the steps to be told, hierarchizing the information and, at the same 
time, managing its timing, effects, and steps. In the case of the church of St. Francesco, the paper document 
becomes the protagonist of a three-dimensional reconstruction of the project that then gradually reifies into the 
remaining structure of that architectural idea of which we have the trace sketched as a starting point. Geometric 
analysis of the plan follows until the building is made visible according to the architect's compositional ideas, 
finally arriving of what remains of the walls of the present ruin (fig. 13). 


In the case of the church of St. Maria, on the other hand, the narrative starts from the dissolving of the interior and 
exterior walls of the church until it arrives at the level of the floor, beyond which the reconstructive hypothesis of 
that hypogeous environment, known only through the studies carried out on the subsoil, is depicted. Thus, what is 
not visible is made visible, leaving an idea, a curiosity about what is hidden under the thick walls of the Sovana 
church that we still see today (fig. 14). 


3.1 The choice of hologram projection 


Using high technology and new methods of digital representation such as holography, we expect to develop more 
explanatory and engaging narratives (Luschi, et al. 2023). Holograms become a form of interactive and educational 
visualization, closer to reality by distancing themselves from the use of the support of visors (VR) that isolate from 
the outside world. 


With the implementation of the above-described videos in the holographic showcases inside the Diocesan Museum 
of Palazzo Orsini in Pitigliano, it was possible to give visibility to the museum and its inaccessible external 
archaeological sites, making them virtually visitable (figs. 15,16). 


The museum reality has the purpose of both musealization and being able to have effective communication with 
visitors. The goal is to present new forms of communication that allow the dissemination of knowledge about the 
historical-archaeological heritage. Allowing the involvement of increasingly diverse user targets, offering a 
complete and satisfying visit (Lecci, et al. 2022). 


4. CONCLUSIONS 


The results that have emerged from the two approaches and experiences are of different tenors. The first that of 
the church of St. Francesco is the attempt to pursue an idea that leaves its trace from the beginning for the 
conception of the drawing, to its realization. In between these two major terms is an activity of logical 
reconstitutions of the project idea that is verified in parallel, both by the drawing and the preliminary sketch, and 
by the result whereby the extreme terms make the intermediate procedure more and more plausible. 


The second experience, on the other hand, there is an evolution to be made visible, still, and partly inaccessible. 
The technology used puts us in a position to extrapolate a drawing. This drawing is a fact that, however, must be 
compared with the actual drawing of the church of St. Maria. 
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Fig. 13: Some frames in sequence of the video elaboration of St. Francesco’s convent in Pitigliano, created to be 
projected inside the holographic showcase. 


Fig. 14: Some frames in sequence of the video elaboration of St. Maria in Sovana, created to be projected inside 
the holographic showcase. 


Fig. 15: The showcase shows the holographic projection of the video, as indicated by the sequence of frames. 


Here it is that between these two moments a logic must intervene introduced precisely by a digital model that 
mediates the positions, the hypotheses and makes them visible in a becoming understandable and consistent with 
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what are the scientific data placed at the end of it. 


So, if in the first case there is a drawing of a project and then a verification of that data through a model, in the 
second case we have two survey models of a different nature working together and helping to verify the 
evolutionary stages of a site. Focusing not on the representation of the object itself but of an evolution of an idea 
and a resemantization of urban spaces. 


The digital copies, then, managed to show the effectiveness of a research and its different moments of in-depth 
reconstruction of hypotheses, returning a unicum that is coherent and somehow re-presenting the becoming of 
these architectures, in the passage of time. 


Fig. 16: Photos of the holographic showcases with their projections, inside the Museo Diocesano of Palazzo 
Orsini in Pitigliano, on the inauguration day, year 2021. 
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ABSTRACT: Nowadays, despite the growing attention to indoor environmental quality and comfort, existing 
workplaces still often fail to meet employees’ expectations and needs, affecting their well-being and productivity. 
In order to improve management decisions, crucial insights can be provided by the timely correlation of 
objective workplace conditions, observed by sensors, and subjective workers’ feedback, collected through 
Ecological Momentary Assessment (EMA) method. This paper presents a prototypical Digital Twin for the 
assessment of workplace performance from an occupant-centric perspective, based on the integration of IoT, 
BIM and Semantic Web technologies. Following the definition of relevant use cases and requirements a layered 
system architecture is presented and the prototype implementation is discussed. For capturing the workplace’s 
environmental properties, a sensor network based on the Zigbee communication standard is proposed due to its 
data transmission efficiency. The measured data, converted in the lightweight MOTT protocol, are streamed to 
an InfluxDB time series database where they are stored along with the incoming workers’ feedback collected as 
survey responses with a dedicated web application. These time series data are queried and transported into a 
developed web platform for integrating BIM and RDF data within the standardized structure of Information 
Containers for linked Document Delivery (ICDDs). Inside this platform, the IFC model of the workplace, the 
measured data from the sensors, and the worker generated RDF data according to the WOMO ontology for 
occupant-centric workplace management are linked. The capabilities of the workplace Digital Twin prototype 
are finally demonstrated querying the linked heterogeneous data to fulfil workplace management tasks in a case 
study provided at the end of the paper. 


KEYWORDS: Digital Twin, Workplace performance assessment, Well-being and productivity, Linked Data, 
Information Container for linked Document Delivery (ICDD), Semantic Web, Internet-of-Things (IoT). 


1. INTRODUCTION 


Providing high-quality indoor workplaces that meet their occupants’ needs is a challenge of utmost importance 
for Facility Managers (FM) because of the critical impact they have not only on employees’ quality-of-life, 
health and well-being (Vischer and Wifi, 2017), but also productivity (Al Horr, Arif, Kaushik, et al., 2016). 
However, although the growing adoption of Information and Communication Technologies (ICT) to control and 
automate building systems (e.g. HVAC, lighting, access, etc.) has enabled an unprecedented granularity and 
interactivity in the operation of workplaces, evidence suggest that they still fall short of their occupants’ 
expectations (Abbaszadeh ef al., 2006). To address this issue, occupant-centric approaches for control and 
operation of buildings have been recently proposed, shifting the technology-centred paradigm to the recognition 
of the user, with its individual and dynamic physiological and psychological requirements, as the most critical 
component in the occupant-building system (O’Brien et al., 2020). 


Recent research focused on supporting managers in the assessment of workplace performance providing them a 
constant holistic understanding of employees’ individual activities, preferences, and conditions within their 
physical and social work environment. In this regard, effective solutions have been proposed for the timely 
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collection of occupant- and building-generated data and their semantic integration, processing, and visualization 
using Building Information Modeling (BIM), sensor networks and Semantic Web technologies (Abdelrahman, 
Chong and Miller, 2022; Donkers, de Vries and Yang, 2022a). However, the development and implementation of 
these approaches for workplace management purposes are still in their early stages due to limitations in domain 
knowledge representation, and heterogeneous data integration strategies that need further investigation. 


In order to address these issues, this paper presents the concept of a semantic Digital Twin for the integration and 
exploitation of heterogeneous workplace data, built on the findings of previous contributions by the authors. The 
research framework and workplace domain knowledge formalization are discussed in Bruttini et al., (2022), 
while the storage and processing of semantic data pivots on the use of standardized information containers (ISO 
21597-1:2020) through a dedicated web platform whose effectiveness has been demonstrated in asset and project 
management use cases (Sigalov et al., 2021; Hagedorn, Liu, et al., 2023). In the followings, after the discussion 
of the findings of relevant related works, the system development, prototypical implementation and case study 
demonstration for a workplace performance use case are provided. 


2. BACKGROUND 


Over the past decade, the pursuit of the benefits obtainable with data-driven management and control of the built 
environment with the specialization of the concept of “Digital Twin” for the AECO sector, has witnessed an 
exponential growth (Sacks et al., 2020). The use of Sematic Web technologies and the diffusion of the Linked 
Building Data (LBD)! approach paved the way for the integration of heterogeneous information from diverse 
knowledge domains, hence overcoming the initial limitations of BIM. In particular, the possibility to observe 
building operational conditions, e.g., indoor environmental quality (IEQ) factors and systems status, and 
contextually evaluate them against the way the occupants behave and perceive a given space, enabled an 
unprecedented understanding of building-occupant complex interactions, starting the long-awaited paradigm 
shift towards occupant-centric approaches in building management and operation (O’Brien et al., 2020). 


In the following paragraphs, the findings and limitations of recent relevant studies related to occupant-centric 
building operation and workplace performance assessment are reported. Then, a review on sensor network 
solutions for building monitoring is presented and the state-of-the-art for the semantic integration of building 
data is discussed. 


2.1. Occupant-centric building operation and workplace performance assessment 


An indoor workplace represents a complex system characterized by dynamic mutual interactions between the 
physical space and its occupants. Therefore, to assess workplace performance, quantifying the extent to which it 
supports workers’ activities and meets their needs and expectations is crucial. In this regard, extensive literature 
investigated the impact of different physical and non-physical factors on workers’ satisfaction, productivity, and 
well-being, from workplace IEQ parameters (Al Horr, Arif, Katafygiotou, et al., 2016) to workspace layout (Kim 
and de Dear, 2013) and the degree of user perceived control on the environment (Luo et al., 2016). 


To monitor how these and other heterogeneous factors affect the workers, both building objective properties, 
observed by sensors, and workers subjective conditions must be timely collected and integrated. For the latter, 
indirect approaches based on inferences from historical building systems’ data (e.g., lighting usage for visual 
quality assessment) are making way to the collection of direct occupant feedback through smartphone and web 
applications and wearable devices (Nagy et al., 2023). For this purpose, the Ecological Momentary Assessment 
(EMA) approach (Shiffman, Stone and Hufford, 2008) initially developed for medical and social researches, is 
progressively replacing Post Occupancy Evaluation (POE) for the collection of frequent, real-time occupants’ 
feedback directly from their workplace environments via the use of micro-surveys (Engelen and Held, 2019). 


Nonetheless, a system that enables the semantic integration, processing, querying and visualization of building- 
and worker-generated data is necessary to support occupant-centric workplace management and performance 
assessment. In this regard, recent contributions showed the feasibility and opportunities provided by the adoption 
of BIM, sensor networks and semantic web technologies for the integration of static and dynamic domain- 


' W3C Linked Building Data (LBD) Community Group - https://www.w3.org/community/Ibd/ - (accessed 
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specific data for IEQ and occupant experience assessment (Abdelrahman, Chong and Miller, 2022; Donkers, de 
Vries and Yang, 2022b, 2022a). However, since indoor workplace management purposes still need to be fully 
addressed with specific solutions, the authors developed a framework and an ontological representation of 
worker’s conditions and activities in indoor environments which form the conceptual basis of the workplace 
digital twin prototype proposed in this paper (Bruttini et al., 2022). 


2.2. Sensor networks for building monitoring 


For the contextualization of subjective worker’s data, his surrounding environmental conditions must be 
objectively observed through a sensor infrastructure which provides for sampling, transferring, and storing of the 
sensed data. In recent years, Internet-of-Things (IoT) technology has established as the main solution for the 
implementation of such sensor networks. Most of them make use of a similar multi-level hardware architecture 
that addresses the above-mentioned challenges. Kifouche et al. (2017) describes three levels, the sensor-, 
gateway- and base-station-level. Li et al. (2023) add a further application-layer for data visualization interfaces. 


Concerning the sensors, both wired and wireless solution can be deployed. While the former can be more 
reliable, it needs additional cabling and therefore lacks in flexibility (Tanasiev et al., 2021). Hence most of the 
studies use wireless solutions, relying on sensor devices that consist of a sensing unit, a microcontroller and a 
radio adapter for the data transmission. As sensing unit, anything from the widely used temperature and humidity 
sensors up to a motion or CO2-level sensor is possible. A microcontroller reads out its data and sends them to a 
gateway via Wi-Fi, Bluetooth (Li et al., 2023), Zigbee (Kifouche et al., 2017) or LoRa (Kifouche et al., 2017; 
Tanasiev et al., 2021). As wireless sensors are mostly battery powered, power management is a crucial aspect, 
finding the right balance between power consumption, bandwidth, and transmission range. While some studies 
assemble their own sensor devices on a prototypical base, there are approaches as well that make use of out of 
the shelf sensor devices with proper device housing (Chamari, Petrova and Pauwels, 2023). 


The gateway is placed on site and acts as a translator receiving data from the sensors and routing them into the 
backend system. Thus, the selection of the radio technology and communication protocol, together with the 
building substance and materials, substantially effects the network range, determining the number of gateways 
needed for a seamless coverage inside the building (Kifouche et al., 2017). As used within several works 
(Kifouche et al., 2017; Tanasiev et al., 2021; Li et al., 2023), it is suitable to implement the gateway with a low 
cost SoC computer like a Raspberry Pi, which connects via Ethernet with the backend system. 


Detached from the installation on site, the backend system can be deployed anywhere else, even in the cloud. It 
implements a software solution for processing and storing the forwarded data. While earlier works made use of 
individually designed solutions (Kifouche ef al., 2017), recent studies showed the effective adoption of the 
machine-to-machine (M2M) and IoT protocol called Message Queuing Telemetry Transport (MQTT) (Tanasiev 
et al., 2021; Chamari, Petrova and Pauwels, 2023; Li et al., 2023). Eventually, besides relational databases such 
as MySQL(Kifouche et al., 2017; Tanasiev et al., 2021; Zhang and Beetz, 2022; Li et al., 2023), for storing 
sensor observations the adoption of NoSQL, time series databases and RDF data stores is growing especially in 
semantic information model applications (Chamari, Petrova and Pauwels, 2023). 


2.3. Linked Data and information containers for semantic Digital Twins 


With the recent advent of the Digital Twin paradigm in the AECO sector, viable solutions for the integration of 
BIM with both static and dynamic data for the real-time representation of physical built assets became crucial. In 
this regard, several studies proved that the adoption of Semantic Web technologies and Linked Data approach 
enable the deployment of semantic Digital Twins where building information can be enriched via linking with 
heterogeneous domain-specific data, whose representation is in turn demanded to dedicated ontologies 
(Mavrokapnidis et al., 2021; Eneyew, Capretz and Bitsuamlak, 2022). 


However, in a context where a standardized approach for the creation and maintenance of Digital Twins is still 
missing and systems’ requirements are subject to frequent transformation, using BIM data and models as 
common basis for the implementation of domain-specific Digital Twins with a modular approach can provide the 
much-needed flexibility and scalability. Kosse et al., (2023) argue how modular Digital Twins can be 
implemented with the use of standardized information containers which provide a model for storing and 
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exchanging heterogeneous information. In particular, as shown in Polter and Scherer (2023), and Zinke et al. 
(2023), Information Containers for linked Document Delivery (ICDDs), compliant with the ISO 21597-1 (2020), 
are suitable for this purpose since they implement a vendor-neutral data structure which integrates, besides 
payload documents to be exchanged, distributed linked data. Moreover, supporting the Linked Data approach, 
their interconnection to web standards such as HTTP and REST is easily implementable, while Semantic Web 
technologies enable data retrieval through SPARQL queries. Arbitrary data can be modeled using an ontological 
layer that can be stored in the container or as web resources, while a specific linking structure supplements the 
capability of the container to host a Digital Twin. Furthermore, as demonstrated Senthilvel and Beetz (2021), the 
possibility to nest and interlink ICDD individual container modules opens to a scalable system-of-systems 
approach where compatibility is ensured by the containers’ conformity to ISO 21597. 


Likewise, in previous research the authors showed the feasibility and versatility obtainable with the adoption of 
ICDD containers for the storage, integration, querying and visualization of building and domain-specific 
information with the development of a dedicated web platform. With differences in implemented functions and 
user interface, the proposed approach proved effective for different use cases, including infrastructure asset 
management (Hagedorn, Liu, et al., 2023), and smart contracts-based automated payment and contract 
management (Sigalov et al., 2021). On this basis, as detailed in the followings, a customized version of the 
mentioned ICDD platform is proposed for the implementation of the presented workplace digital twin prototype. 


3. RESEARCH FRAMEWORK AND METHODOLOGY 


The present study is part of a broader research effort which aims at the realization of a data-driven workplace 
management framework finalized to the enhancement of workers’ well-being and productivity through capturing 
and understanding the dynamic worker-workplace interactions. To achieve this goal, a workplace semantic 
digital twin, able to integrate heterogeneous occupant- and building-generated data for workplace management 
purposes, is proposed. With reference to Fig. /, this paragraph describes the study’s methodology, from the 
system conceptualization to its prototypical implementation to a room-scale case study. 


DATA MODEL DEVELOPMENT 
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Fig. 1 Research methodology 


As discussed above (see §2), the development and implementation of the proposed system stems from previous 
authors’ contributions. The formal representation of workplace knowledge, characterized as the intersection of 
the worker-building-activity semantic domains, is provided in the Occupant-centric Workplace Management 
Ontology (WOMO) presented in Bruttini et al., (2022). Workers’ objective and subjective features, along with 
their current activity, are described and interlinked through feedback instances, hence related to the 
correspondent building spaces and conditions. These are represented by reusing well-established ontologies, 
such as the Building Topology Ontology (BOTY and Semantic Sensor Network ontology (SSN). In turn, for the 
semantic storage and integration of the aforementioned heterogeneous workplace data, standardized information 
containers (ISO 21597-1, 2020) are adopted. Therefore, the workplace knowledge base, comprising of its IFC 
model, workers’ and sensors’ data, is realized through an ICDD container and semantic data integration, 


? https://w3c-lbd-cg.github.io/bot/ - (accessed 12/07/2023) 
3 https://www.w3.org/TR/vocab-ssn/ - (accessed 12/07/2023) 
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querying, visualization, and rule-based validation are enabled by a dedicated web platform developed by the 
authors (Hagedorn, Pauwels, et al., 2023; Sigalov et al., 2021; Hagedorn, Liu, et al., 2023; Hagedorn, Senthilvel, 
et al., 2023). 


The system development involved three main steps, namely: use cases’ identification, requirements definition 
and architecture conceptualization. Then, the criteria for the selection of the hardware and software solutions for 
the system prototype implementation are described, including the development of a custom web application for 
the collection of workers’ feedback. Eventually, the capabilities of the prototype are evaluated for a case study 
office room. A workplace performance assessment use case is tested through the querying and visualization of 
the collected workers’ feedback and contextual observed environmental conditions. 


4. SYSTEM DEVELOPMENT 
4.1. System use cases 


According to the research framework’s overarching goal and to the scope and purposes that drove the workplace 
domain knowledge formalization within the WOMO ontology (Bruttini et al., 2022), the system shall inform and 
support managers’ decisions for the improvement of employees’ well-being and productivity, providing insights 
from the correlation between workers’ subjective feedback and workplaces’ objective conditions. For this 
purpose, three general use cases have been identified, namely: 


e Workplace performance assessment — Evaluation of how a workplace supports or hinders its occupants 
through the correlation of workers’ subjective feedback with the objective indoor environmental conditions. 

e Workplace issue discovery — Evaluation of the factors that contribute to the occurrence of unsatisfactory 
conditions and identification of latent issues that affect workers’ needs (e.g., privacy, focus, lighting, etc.). 

e Worker preference clustering & Spatial recommendation: Recognition and evaluation of recurrent data 
patterns and correlations, and implementation of artificial intelligence-enabled methods for learning and 
predicting ideal conditions for worker groups or profiles (e.g., based on environmental preferences, activity 
needs, etc.), and for the recommending solutions for underperforming spaces or occurred issues. 


The presented system prototype implementation focuses on the workplace performance assessment use case, 
leaving the remaining to future developments. 


4.2. System requirements 


The system requirements, on which the following system architecture conceptualization and prototype 
implementation is based, are listed below per functional area: 


e Building-generated data collection — To monitor workplace’s indoor environments, the sensors’ typology 
and communication protocol shall favour easy, flexible, and affordable deployment while providing 
reasonable accuracy. 

e = Occupant-generated data collection — The collection of worker objective and subjective data shall be using 
voluntary feedback responses to timely micro-surveys developed according to the EMA methodology. 
Feedback time and location are mandatory, while customizable feedback request’s generation (e.g., 
scheduled, voluntary), survey’s prompts and rating scales shall be granted. Collection of workers’ 
momentary health indicators (e.g., heart rate), environmental preferences (e.g., thermal quality), and self- 
assessed conditions (e.g., productivity) shall be enabled along with their current activity. 

e Time series data handling and storage: Both the transmitted building- and occupant-generated data shall be 
stored and organized in a store specialized for time series data. The database structure shall not be 
constrained in terms of data sources (i.e., sensors, feedback interfaces) and structure (i.e., building observed 
properties, worker data and survey fields), and shall allow data storage efficiency (i.e., down sampling), 
querying and aggregation. 

e Data semantic integration, querying and visualization: Heterogeneous static and dynamic workplace data 
shall be integrable to realize an evolving workplace knowledge base where information is semantically 
structured and interlinked according to acknowledged ontologies, and reasoning, querying, and 
visualization are enabled. This shall include, but not be limited to, the geometries and properties of building 
spaces and elements (i.e., IFC model), sensors’ observations and workers’ feedback. 
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4.3. System architecture 


On these assumptions, a comprehensive four-layered system architecture has been drawn as shown in Fig. 2. At 
the bottom, the physical layer represents the physical workplace from which the data describing the building 
operational and environmental properties (e.g., air temperature, window opening, etc.) are collected by the 
deployed sensors, and workers’ objective and subjective data (e.g., location, environmental preference, activity, 
etc.) are provided via momentary feedback. This layer transfers the data to the upper data storage layer where 
two functionally distinct stores are identified: one dedicated to dynamic data (i.e., timeseries sensors’ 
observations and workers’ feedback); the other, dedicated to consolidated data (e.g., aggregated sensor 
observations) and less frequently changing or static building information. The latter forms the system’s core 
knowledge base, where workplace heterogeneous information (e.g., BIM model, organizational data, timeseries, 
etc.) are stored according to appointed ontologies and hence semantically interlinked. 
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Fig. 2 Workplace digital twin system architecture 


In turn, the data integration layer provides access to the workplace static and dynamic data, allows for their 
semantic integration and processing, and mediates the incoming requests from the top application layer. This 
last layer provides the user, i.e., manager, with the digital twin-based services that shall serve the identified use 
cases, such as: workplace condition monitoring and visualization; building-activity-worker data correlation for 
workplace performance assessment, issue discovery and spatial recommendation. 


4.4. System prototype implementation 


This paragraph describes the hardware and software choices taken for the system prototype implementation. As 
shown in Fig. 3, the physical workplace can be represented as the combination of several workspaces (i.e., 
building’s spaces) that shares spatial, organizational, or functional properties at different scales and are occupied 
by the workers during their daily working routines (e.g., a part of an open space, a single room, a workstation, 
etc). For this implementation, the system’s targeted workspaces consist in private or shared office rooms with a 
gross floor area not exceeding 50 m?. For workspace properties monitoring, a wireless network based on the 
Zigbee communication standard has been chosen due to the acknowledged performances in terms of network 
reliability, ease of deployment and affordability of compatible commercial devices in smart building 
applications. Three types of sensors have been selected to monitor objective workspace properties, namely: 
temperature and humidity sensor for the thermal environment; contact sensor for window status; motion sensor 
for occupancy detection and count. All types of sensors are battery-powered, can be installed without screws and 
transmit data wireless, allowing for fast and flexible deployment, substitution, and maintenance. A Raspberry Pi* 


4 https://www.raspberrypi.com/products/raspberry-pi-3-model-b/ - (accessed 14/07/2023) 
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SoC provided with a universal USB Zigbee gateway is appointed as network coordinator and transmits sensor 
data to the system backend via Ethernet connection. Moreover, the mesh topology of the Zigbee network allows 
for the addition of power-supplied devices acting as repeaters (e.g., smart plugs), hence providing redundancy 
and easy range extensibility. 
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Fig. 3 Workplace digital twin system prototype implementation scheme 
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The encoding and transmission of the sensors’ data has been demanded to the MQTT protocol due to its 
suitability in IoT applications that requires lightweight, machine to machine, message exchange. The sensors’ 
observations (i.e., actual measurements and metadata) are gathered from the gateway, encoded into individual 
message packages, and published via MQTT protocol on dedicated topics, one for each sensor, to an Eclipse 
Mosquitto® broker instantiated at the system backend. On the same server, an InfluxDB* timeseries database 
instance connected with an agent (i.e., telegraf °) to the MQTT broker and subscribed to all relevant topics 
receives and stores the sensor-generated data into a dedicated bucket. 


For the collection of workers’ data, a web application presenting a one-page survey form has been developed. 
Accessing the form via browser, workers can provide feedback instances on a voluntary base. The survey 
interface is designed to allow fast responses in order to prevent survey fatigue bias. For this reason, worker ID 
and current location, corresponding to their allocated workstation, are preset and not editable by the responder. 
The other survey fields allow for multiple choice response and can be customized to query for the current worker 
activity and to express their environmental preferences and self-assed conditions. The survey web application 
uses the HTTP POST method and the Influx API to transmit and write the collected responses to timeseries 
records within a dedicated feedback bucket in the InfluxDB database. 


The last component of the prototype implementation consists in the aforementioned web platform through which 
the heterogeneous workplace data are stored in a dedicated ICDD container, are semantically interlinked 
according to predefined ontologies and hence processed, queried, visualized. Here, the workplace IFC model, 
comprising of the geometrical and functional information necessary for the description of the identified 


> https://mosquitto.org/ - (accessed 14/07/2023) 
€ https://www.influxdata.com/; https://www.influxdata.com/time-series-platform/telegraf/ - (accessed 14/07/2023) 
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workspaces, forms the foundation of the workplace digital twin knowledge base. Dedicated platforms’ functions 
retrieve sensors’ and workers’ data from the timeseries database and stores them as RDF triples accordingly to 
the SSN ontology. In turn, the building information contained in the IFC model are mapped to BOT ontology 
classes (i.e., bot:Space and bot:Element) and linked to sensors’ observations and workers’ features, 
preferences, activities and feedback according to the WOMO ontology. Eventually, the resulting knowledge 
graph can be queried within the platform, and dedicated services provide for the visualization of the results to 
support workplace performance assessment use cases. 


5. CASE STUDY 


In this section, the capabilities of the presented prototype are demonstrated with a room-scale case study that 
involved the collection of worker feedback and sensor data over a period of one week. The collected data are 
integrated with static workplace information stored in a correspondent ICDD container (e.g., IFC model), then 
queried and visualized for a performance assessment use case using the dedicated web platform. The appointed 
room is a shared office in availability of the Chair of Computing in Engineering at the Ruhr University Bochum 
(Bochum, Germany). The office and case study setup specification are shown in Fig. 4. 
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5.1. Sensor network setup 
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Fig. 4 Case study setup 


The deployed sensor network consists of commercial products based on the Zigbee standard and widely adopted 
in smart building applications. The main characteristics of the installed components are reported below, along 
with their vendor and model to allow for specification retrieval: 


e Air temperature and humidity sensor (x1) — It is positioned under the work plane of one of the desks in 
order to: avoid direct exposure to sunlight or radiators’ heat emission; avoid obstruction to other objects; 
observe occupants’ thermal micro-environment (i.e., height 0,70m) during work without being exposed to 
their body heat emission. [Aqara WSDCGQ11LM] 

e Contact sensor (x2) — They return the open/closed status of each operable window. [Aqara MCCGQ11LM] 
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e Motion sensor (x4) — They use passive infrared (PIR) detection and are positioned under each of the desks’ 
work planes. The sensor field-of-view is partially obstructed so that the detection of false positives is 
minimized. Presence at workstations is aggregated to determine room occupancy. [Aqara RTCGQI1LM] 

e Network coordinator — A Phoscon ConBee II universal USB Zigbee 3.0 gateway connects the sensors and 
can support other Zigbee compatible devices from different vendors. The gateway is installed on a 
Raspberry Pi 3 Model B, connected via ethernet to the backend. In turn, on the Raspberry Pi runs an open 
source Zigbee2MQTT’ bridge that enables network configuration (i.e., device pairing, removal and setting) 
with a graphical user interface accessible via browser. The bridge encodes the incoming sensor data into 
MQTT messages that are published to the Eclipse Mosquitto MQTT broker instance at the backend. The 
topics’ hierarchy focuses on the sensors, and presents three levels: argument, sensor type and ID (e.g., 
“sensors/contact/c01” for contact sensor “C01”). Therefore, decoupling the sensors’ deployment from the 
network configuration, higher flexibility is provided. Sensor metadata (e.g., observed window) are stored in 
the workplace knowledge graph. 


Eventually, an InfluxDB agent, telegraf, is configured to connect to the MQTT broker, subscribe to all the topics 
of interest (i.e., “sensors/#”’), decode sensor messages’ payloads and write the related observations in timeseries 
within the predisposed sensor bucket. 


5.2. Worker feedback web application 


The collection of workers’ feedback involved the four employees assigned to the case study office for one week 
(i.e., five workdays, 10-14" July, 2023). A web application has been implemented to collect their momentary 
feedback as responses to a single-page survey form (Fig. 5). Survey fatigue bias has been minimized presetting 
the workers’ IDs and location and limiting the survey’s queries to three. First, the specification of the current 
activity category and type is requested among five options: solo work, call, group work, break, or other 
unspecified activity. Then, the expression of the worker preference towards the perceived thermal quality is 
requested in a three-points scale: prefer cooler, no change, or prefer warmer. Eventually, the self-assessed 
productivity shall be indicated among not productive, normal or very productive. Rating scales’ mid-points 
represent satisfaction with the environment and baseline productivity. To submit the feedback, at least one query 
must be responded. The survey response data are posted to Influx DB and stored as timeseries in the dedicated 
feedback bucket. The subjects involved have been informed about the research purposes, have agreed to 
voluntarily provide the feedback data and allow for their anonymized use and dissemination. 
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Condition 


Fig. 5 Worker feedback web application — Survey form 


7 https://www.zigbee2matt.io/ - (accessed 17/07/2023) 
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5.3. Workplace performance assessment 


The case study evaluation of the workplace digital twin prototype has been carried out in terms of its capabilities 
of integration, querying, and visualization of heterogeneous data (i.e., sensors, workers, building) for workplace 
performance assessment. For this purpose, a use case related to the assessment of the perceived thermal quality 
has been specified in form of the following competency questions (CQs): 


CQ1: How has a workspace performed for thermal quality in a certain period? 
CQ2: Which workspace s environmental conditions are related to the reported thermal preferences? 


To address the above CQs, a customized “workplace performance dashboard” service has been implemented 
within the discussed ICDD web platform (Fig. 6). On the left-hand panel an authorized user can retrieve and 
access all the resources and contents organized in separate ICDDs containers, one for each identified workspace 
(i.e., case study office “IC6-83+85”). In the Ontology Resource folder, the data structures necessary for the 
semantic formalization of the containers contents, link and domain knowledge are stored. The Payload 
documents folder contains the workspace IFC model and the proxy documents corresponding to each installed 
sensor. In turn, these can be accessed to retrieve the related observations from the timeseries database. Besides, 
the reified worker, feedback, sensor data and internal links are stored in the Payload triples folder. 


nea RUR ICD PLATFORM now ‘ FES MANAGE MAE Ret MEMTATICN APL CONTACT ADMIN ANA A teerneges Betmin (Kt =~ RUB 


ae aeons Workplace Performance Dashboard 


r 
8} 


Fig. 6: Workplace performance dashboard (ICDD web platform) 


In the central panel the user is provided with a graphical interface for querying and visualizing the data. The 
proposed approach for workplace performance assessment is centred on the evaluation of the conditions 
perceived by the employees and expressed with their feedback responses, hence the user has two filtering options 
to reduce the scope of the query: first, the object of the assessment must be chosen among preferred workspace 
environmental properties, activities performed by the workers or their experienced conditions; then, the target 
time interval must be specified. For the presented use case, feedback data are filtered for thermal quality 
preferences expressed within the data collection period of 10-14" July, 2023. In the Overview section the 
distribution of the responses is then returned according to the adopted rating scale and in relation to the mean 
values of the sensor observations at the corresponding feedback times. Therefore, the manager can not only 
assess the workplace performance in terms of overall thermal quality (CQ1) but also understand which 
conditions contributed the occupants’ thermal comfort and take better informed actions to mitigate the 
occurrence of unsatisfying conditions (CQ2). In this regard, the bidirectional link established between the sensor 
and feedback data with the corresponding element of the workspace IFC model showed in the right-hand viewer 
contributes to enhance the visualization of the queried data. In fact, selecting one retrieved feedback instance the 
workspace element related to its source location is highlighted; conversely the selection of another linked 
element in the model can be used for further filtering the query results (i.e., per workstation). 
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6. CONCLUSIONS 


The improvement of workers’ well-being and productivity in existing indoor workplaces can be achieved with 
the adoption of occupant-centric approaches based on the understanding of the complex building/worker 
interactions. For this purpose, this paper presents the concept of a semantic digital twin that enables the linking 
and contextual interpretation of building, sensor, and worker data to support workplace management use cases. 
The system requirements are identified along with a comprehensive four-layered architecture, and the system’s 
prototypical implementation is discussed. The adoption of commercial Zigbee devices and the MQTT standard 
protocol for data communication are proved effective for the deployment of an affordable, flexible, and scalable 
sensor network. An InfluxDB database is implemented to efficiently store and easily access both sensor and 
feedback timeseries data, the latter collected with a custom developed web survey application. The core digital 
twin services related to the semantic integration, querying and visualization of the heterogeneous workplace data 
are realized with the adoption of standardized ICDD containers and Semantic Web technologies, enabled 
through a custom developed web platform. Eventually, the system prototype’s capabilities for the assessment of 
workplace performance are demonstrated with the correlation of workers’ thermal preferences and workspace 
condition observed for the case study. 


At this development stage, the proposed concept still presents several limitations that shall be addressed with 
further research. The extension of the digital twin prototype in terms of number of monitored workspaces, 
building and worker features considered, employees involved, and feedback collection period is currently 
undergoing to test it against a building-scale application scenario. Furthermore, additional platform’s services are 
under development to investigate semantic reasoning opportunities that the system can provide for workplace 
issue discovery and spatial recommendations. 
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ABSTRACT: Digital Twin (DT) developments and applications in the Architectural Engineering Construction 
(AEC) Industry are emerging. However, insufficient publications synthesised the existing literature on DT of 
existing buildings, including energy retrofit and challenges as part of Net-zero strategies. When developing DT 
systems, it is vital to include the existing buildings primarily captured in 2-Dimensions (2-D) static data. To date, 
the implementation of DT has been minimal in applications in existing buildings in the UK. Despite DT benefits 
for maintenance (O&M) managers, facilities management (FM) as a comprehensive source of consistent data for 
predictive maintenance. This study explored the challenges faced by DT adoptions in existing buildings through 
a systematic review of the extant literature. A systematic approach is adopted to search the Scopus database using 
relevant keywords such as "Digital Twin.", "Built Environment" and "Existing Buildings.". the study focused on 
publications from the past five years (2018 to 2023) and prioritised articles in Scopus. The findings of this paper 
showed that the practitioners, O&M managers, and academics in built environments need more proper knowledge 
and technical expertise on digital twins as part of Industry 4.0 (14.0). Evidence from the literature resulted in low 
empirical case studies and applications. The complexity of real-time data integration and interoperability were 
highlighted as part of the challenges despite the need for comprehensive knowledge of DT in the built environment. 
Scarce publication on the study was noted. The directions for comprehensive solutions and future research on 
digital twin applications in existing buildings towards achieving efficient energy retrofits, cost reductions, and 
net-zero goals were highlighted. 


KEYWORDS: Digital Twin, BIM, Data, Buildings, Energy, Management. 


1. INTRODUCTION 


The emergence of Industry 4.0 has shifted the trajectory in the built environment. Digital Twin (DT) is essential 
to implement Building 4.0 (Delgado et al., 2023). There is more interest in DT technology as a building block of 
the metaverse and a vital pillar of Industrial 4.0 that needs to be harnessed (Hassani, Huang, & MacFeely, 2022). 
What is the place of DTs in constructed facilities projects? (Khallaf, Khallaf, Anumba, & Madubuike, 2022). The 
Digital Twin framework is based on Building Information Modelling (BIM) and a newly created plug-in to receive 
real-time sensor data from the physical instance (H. Hosamo, Hosamo, Nielsen, Svennevig, & Svidt, 2023). The 
built environment needs to move from a static sustainability assessment to a DT- based and IoT dynamic approach 
to turn climate and environmental challenges into opportunities and support the sustainability decision processes 
throughout the whole building’s life cycle (Tagliabue et al., 2021). DT has become a ‘hot topic’ among academic 
circles and commercial communication in the industry. However, the DT is often misunderstood or misused as it 
is ‘trendy’(Zhou, Zhang, & Gu, 2022). 


Why DT instead of a traditional monitoring system? Traditional or human monitoring systems resulted in 
wastages, siloed documentation, and incorrect monitoring activities (Agrawal, Thiel, Jain, Singh, & Fischer, 2023; 
Khalil, Stravoravdis, & Backes, 2021; Sagarna, Otaduy, Mora, & Leon, 2022). Change is required, and DT could 
solve these problems better. The awareness of the interaction of humans and DT is vital to eliminate costs, strategic 
misalignments, misallocation of resources, and unrealistic expectations from DTs due to DT technology’s 
immaturity (Agrawal et al., 2023). 


DT is needed for prediction and stimulation due to the inability of the 2-Dimensioal (2D) or 3-Dimensional (3D)- 
BIM) data on its own to give a suitable platform for prediction. The local and global market dynamic changes 
require more robust and innovative operational BIMs in the AEC-FM sectors. The static 3D model needs an active 
approach (Harode, Thabet, Jamerson, & Dongre, 2023). However, the existing maturity models lack DT 
implementation comprehensively and quantitatively (Chen et al., 2021). The levels and advancement of 
digitalisation in the AEC industry have different capabilities. The AEC industry is fragmented into the pillars of 
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SECTION F - DIGITAL TWIN 


Industry 4.0: the Internet of Things (IoT), big data, augmented reality, advanced visualisation, Virtual Reality 
(VR) and simulation, additive manufacturing, system integration, cloud computing, autonomous systems, and 
cybersecurity (Pour Rahimian, Dawood, Ghaffarianhoseini, & Ghaffarianhoseini, 2022). There are issues with 
minimising the cost and manual labour of the automated segmentation of individual instances for more efficiency 
and valuable outcomes for geometric digital twins (Agapaki & Brilakis, 2021). 


The global goals to address environmental challenges in the construction industry and operational assets' life 
cycles warrant a change of approach. The concept of digital twins’ application as one of the solutions has a 
knowledge gap for its adoption in the industry. Integrating IoT, BIM and AI as parts of the digital twin to automate 
the monitoring and control of emissions from existing assets evidence interactive trends and patterns from 
collected data through the integration of machine learning. It enhances facility management as a potential for net- 
zero targets. However, there are limitations, such as digital shadow usage instead of real-time digital twins 
(Arsiwala, Elghaish, & Zoher, 2023). Nearly Zero Emission Buildings (NZEBs), reducing the energy consumption 
of the existing building is necessary, and the need to manage energy consumption is supported (Agostinelli, Cumo, 
Guidi, & Tomazzoli, 2021; Francisco, Mohammadi, & Taylor, 2020; Kaewunruen, Rungskunroch, & Welsh, 2019; 
Tang et al., 2023). Still, the viability of digital twins' financial and technical implementation has been questioned. 
However, the BIM-Digital Twin integration for detailed energy stimulation was used in a case study in the United 
Kingdom (UK); it was proven that digital twin implementation has the potential for a 23-year return period for 
renewable technology for an existing building (Kaewunruen, Rungskunroch, et al., 2019). 


The DT-based approach could provide an adaptive comfort model, energy-saving strategies, and building comfort 
optimisation. Exploiting the digital twin approach supports sustainability decisions through the whole adoption 
cycle for climate issues and environmental challenges; it can be used for energy-saving in different types of 
buildings (Tagliabue et al., 2021). The effectiveness of implementing a digital twin for asset management is 
lacking for adoption in the housing sector and industry practitioners for predictive monitoring in buildings 
(Arsiwala et al., 2023). Existing research indicates that DT is still needed in stimulation during system run-time 
and different lifecycle phases in academic and industrial communities (M. Liu, Fang, Dong, & Xu, 2021). 


With all the above in mind, this paper aims to develop a direction for research on digital twin applications in 
existing buildings towards achieving the net-zero target. The objectives are 1) to explore the challenges in using 
DT in existing buildings? And 2) to investigate how the built environment sector could use predictive maintenance 
system-based DT in the UK. The rest of the paper sections are 2) the adopted methodology in the study, 3) digital 
twins research for predictive maintenance, 4) research significance and contributions, and 5) the conclusion and 
recommendations for future studies to leverage the adoption of DT for predictive maintenance in the built 
environment. 


2. METHODOLOGY 


According to Webster and Watson (2002), the literature review aims to uncover the past to identify the gaps and 
chart future directions by identifying aspects that research should focus on. The benefits of using a systematic 
literature review in this paper are: 1). to identify where evidence may be lacking, contradictory or inclusive, and 
2). justify why a problem is worthy of further study (Aromataris & Pearson, 2014). See Figure 1. 


Keyword “Digital Limited to Articles Keyword within 
Twin” (n=5,896 documents) “Existing Building” 
(Based on titles, (n=90 documents) 


Content analysis. 


abstracts, and (n=76 documents) 


keywords) 


14,263 Documents Keyword within Limited to English 


“Built Environment” Language 
(n=427 documents) (n=89 documents) 


Fig 1: Systematic Selection Flowchart 


The systematic review methods framework is based on the Initiation, Exclusion, Exclusion and Final phases 
criteria based on research studies by Kitchenham et al. (2010) and Mastan, Sensuse, Suryono, and Kautsarina 
(2022). For the inclusion and exclusion stage, Scopus was used to filter relevant keywords for the study to reduce 
the likelihood of bias, as the largest database of peer-reviewed research compared with Google Scholar or Web of 
Science and PubMed. The keywords searched from titles and abstracts were “Digital Twin”, “Built Environment”, 
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and “Existing Building” as the relevant terms to the research study and the domain, which were used on papers 
published over a timeframe between 2018 and the present. 


Articles were chosen from different journals written in English. The selected articles were then narrowed to 76 
articles in the Scopus database using the process described in Figure 1. The articles were categorised broadly into 
1). BIM-DT Maturity Model 2). DT in the AEC industry 3). DT for Maintenance of Buildings 4). DT for Net Zero 
in Existing Buildings (NZEB) 5). DTs for Facility Management 6). DTs for Energy Management. However, the 
emphasis was an overview analysis of the DT technology for existing buildings management within the AEC 
industry. NVivo software was used for the systematic content review and visualisation analysis. The NVIVO 
software aided rigorous data extraction, evaluation, and categorisation of the large amount of data generated by 
the 76 selected articles. 


3. DIGITAL TWINS RESEARCH FOR PREDICTIVE MAINTENANCE 


The following sub-sections will show cases of the attributes and challenges of DT adoptions within the articles 
selected. These are: 1). BIM-DT Maturity Model 2). DT in the AEC industry 3). DT for Maintenance of Buildings 
4). DT for Net Zero in Existing Buildings (NZEB) 5). DTs for Facility Management 6). DTs for Energy 
Management for an overview of limitations faced within the built environment. 


3.1 BIM-DT Maturity Model 


The emergency of real-time connectivity and deployment in an environment increases the potential for DT 
concepts in the built environment. Noted to be in an early stage of adoption, the analysis indicated unrealised and 
unexploited possibilities of the DT concept for building management; further studies as a baseline were 
recommended (Deng, Menassa, & Kamat, 2021). BIM and DTs are disruptive technologies to be embraced by the 
construction industry due to the recent increase in advanced automation and autonomous technologies at the 
organisation, and need to be adequately leveraged (Sepasgozar et al., 2023). Notably, DT proved to be a valuable 
tool for the entire lifecycle management of buildings since BIM cannot handle the sensory information and 
complexities of existing buildings (Banfi, Brumana, Salvalai, & Previtali, 2022). 


Existing BIM standards (ISO 19650) were proposed to promote a better interoperable digitalised built 
environment to develop DTs in the AEC sector (Nour El-Din, Pereira, Poças Martins, & Ramos, 2022). DT-BIM- 
based model reduced load bearing and risk in existing highway infrastructures. However, concerns were 
information collection and analyses for inspection, maintenance planning, and data sharing and visualisation 
limitations. The BIM-based model provided solutions and insights for high-quality construction and maintenance 
to minimise rapid degradation and component failures in infrastructures. Owners and stakeholders would achieve 
higher operational efficiency and sustainability outcomes over the life cycle, especially the expensive phases 
(Kaewunruen & Lian, 2019). DT-BIM-based model is recommended for the whole life-cycle mitigation of risks 
and uncertainties of exposure to extreme weather conditions for construction industry stakeholders (Kaewunruen, 
Sresakoolchai, Ma, & Phil-Ebosie, 2021). 


Integrations of human-centred approaches, Virtual Design Construction (VDC), digital twin and artificial 
intelligence will transform the future of the AEC industry and create research opportunities. The integrations can 
simultaneously optimise, predict, and provide significant cost savings for the industry work processes, which 
should be noted for the main line of future research (Rafsanjani & Nabizadeh, 2023). BIM, IoT, and Al-supported 
systems are beneficial for prediction and data-driven retrofitting strategies, a step towards achieving the net-zero 
targets (Arsiwala et al., 2023). Multi-layer DT called BIM-IoT-Data integration (BIM-IoTDI) indicated 
capabilities to overcome interoperability and be suitable for smart buildings (Eneyew, Capretz, & Bitsuamlak, 
2022). AI, DTs, and scanning technologies are valuable for maintenance strategies. However, some challenges are 
associated with technological, cultural, market and regulatory factors (Cetin, Gruis, & Straub, 2022). 


Two-directional interactions between humans and computers should be part of the development to reach a high 
maturity level. Further studies, including advanced technologies like AI, BIM, and Geographic Information 
System (GIS) cloud computing, are essential to cope with the complex urban challenges of multidisciplinary DTs 
(Masoumi, Shirowzhan, Eskandarpour, & Pettit, 2023). Automation and robotisation are resources for 
management improvement in construction and existing buildings-the BIM data are programmed to a robot and 
visual format as the digital twin of the real-world building revealed valuable results that need further studies on 
the data reliability, interoperability of the BIM and 3D-Robot connections, and update of the BIM model based 
on robot feedback (Pauwels, de Koning, Hendrikx, & Torta, 2023). Likewise, the Mobile BIM Technology (MBT) 


1208 


functions can help researchers and practices in digitalisation, lean construction, evaluation perspectives and data- 
driven approaches to enable DT. Legal and security perspectives should be considered (Jowett, Edwards, & 
Kassem, 2023). 


The proposed DT and Cloud BIM-Extended Reality (XR) platform development using the Scan-to-BIM-to-DT 
process to a 4D multi-user live app to improve building comfort, efficiency and costs was validated for energy 
improvements for existing buildings and facade renovations (Banfi, Brumana, Salvalai, et al., 2022). Implemented 
DT used scan-to-BIM with sensor data as an Industry Foundation Classes (IFC) BIM platform to monitor the 
structural health of the building walls. It can benefit FMs for planning and maintenance activities and the long life 
cycle of a building (Longman, Xu, Sun, Turkan, & Riggio, 2023). The lack of a clear process roadmap is partially 
a factor. Still, Digital Twin Construction was proposed as a comprehensive construction mode that prioritises 
closing the control loops instead of an extension of BIM tools integrated with sensing and monitoring technologies 
(Sacks, Brilakis, Pikas, Xie, & Girolami, 2020). DT cannot be limited to the focused aspects like 3D modelling, 
monitoring, and visualisation of DTs (Masoumi et al., 2023). Tagliabue et al. (2021) supported shifting from a 
static sustainability assessment to a digital twin-based and IoT-enabled dynamic approach for real-time evaluation, 
sustainability criteria control, and user-centred viewpoint in the built environment. A roadmap is required to 
support decisions and policymakers to aid implementations for the next ten years (2030) (Sepasgozar et al., 2023). 


It should be noted that DTs are extensions of engineering information (EI) leveraging modern information and 
communications technology (ICTs) and do not have their unique technical characteristics (Zhou et al., 2022). The 
integration of DT roles as an observer, analyst, action executor and decision-maker is useful in helping 
practitioners systematically plan DT deployments, clearly communicate goals and deliverables, and lay out a 
strategic vision (Agrawal et al., 2023). Digital Twins are the future of the metadata required for a socio-technical 
collaborative and sustainable ecosystem based on a virtual representation of the physical instance in a real-time 
operational state superseding static 3D- Building Information Modelling or traditional 2D of assets for the 
multidisciplinary AEC domains for operational efficiency and cost reduction (Pregnolato et al., 2022) updated 
information to analyse and optimise ongoing design, planning and production, lean construction, and data-centric 
construction management (Sacks et al., 2020). 


The widespread traction of DTs for constructed facilities projects reveals the benefits of DTs as the ability to 
reduce operating costs and human error, automate energy demand, manage assets throughout their lifecycle, and 
structural health monitoring in real-time (Khallaf et al., 2022) with robot-Assisted reality capture in larger spaces 
(Xu, Xia, You, & Du, 2022). The application of DTs and Machine Learning helps predict and analyse prestressed 
steel structures needed for intelligent monitoring techniques and control methods to ensure safety (Z. Liu, Yuan, 
Sun, & Cao, 2022). IoT and DT connection and integration produce real environments to boost automation, 
Industry 4.0 transformation and cost-effectiveness (Al-Dhlan, 2021) with robot-assisted reality capture in larger 
spaces (Xu et al., 2022). Likewise, DTs in the UK are used for highway asset management to achieve minimum 
human input and have very high accuracy (Jiang, Ma, Broyd, Chen, & Luo, 2022). 


Multi-layer DTs proved to be a valuable tool for the entire lifecycle management of buildings since BIM cannot 
handle the comprehensive DT technology that can be supported with socio-technical approaches to reach maturity. 


The integration of IoT with DT has proven to be valuable with the application of DTs in the multidisciplinary 
AEC domains, even though some challenges were indicated in Tables 1-5. 


3.2 DT in the AEC Industry 


The articles in Table 1 are on DT research in the AEC Industry and the challenges encountered. Infrastructure 
assessment and construction management were highlighted. 


Table 1. Articles on DTs studies within the AEC Industry. 


Author/Year Focus of Study Employed Tools Challenges /Limitations 
(Camposano, Smolander, Awareness of DT of Built Interpretive analysis Inter-organizational 
& Ruippo, 2021) assets relative to the software of semi-structured interviews relationships, data sharing, and 
ecosystem technical expertise 
(Cetin et al., 2022). Adoptions of digital Multiple- case study Circular Technical, cultural, market and 
technologies in social housing Economy Digital Twins regulatory issues. Misalignment 
framework of supply and demand. Data 
quality and sharing 
(Eneyew et al., 2022) Interoperability of smart- BIM-IoT Digital integration Autonomous query decision 
building DTs and data sharing 
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(Jowett et al., 2023) 


Mobile BIM technologies 
taxonomy in field BIM 
interactions with construction 
management functions 


A longitudinal case study over 12 
months, two project workshops, 
expert interviews and an industry 
survey at project, enterprise, and 
industry levels. 


Lack of delineation between 
related terms used. 


(Kaewunruen et al., 2021) 


Vulnerability assessment and 
risk-based maintenance of 
infrastructures 


BIM-DT based model 


Automated interaction of data, 
3D modelling human resources 
and standards, IT software 
natural and human risk 
assessments. 


(Longman et al., 2023) 


The implementation of DTs to 


Scan-to-BIM approach in IFC- 


Data integrations and project 


support structural health BIM platforms. infrastructure 
monitoring (SHM). 
(Nour El-Din et al., 2022) Status of DTs and the Systematic Review Lack of data standardisation 


evolution of the concept of 
DTs for construction Assets 


(Pauwels et al., 2023). 


Live semantic data transfers 
from building digital twins for 
robotic navigation 


BIM-based model 


(Rafsanjani & Nabizadeh, 
2023) 


Cost-saving DT capabilities in 
construction and operational 
phases 


VDC-DT based model + VR+AR 


Automation, data reliability 
and standards 

Technical expertise, 
Compatible software, 
Automation, Human-centric 


scenarios, real-time data senses 
and analysis. 


(Sacks et al., 2020) 


Digital twin information 
systems to achieve closed-loop 
control systems to monitor 
construction. 


Conceptual analysis 


Lack of comprehensive 
information system and 
knowledge 


3.3 


DT for Maintenance of Buildings 


Table 2 comprises articles on building maintenance and the challenges of adopting DTs. Emphases are mostly on 
historical building restoration. 


Table 2. Articles on DTs exploration for maintaining structural buildings. 


Author/Year Focus of Study Employed Tools Challenges /Limitations 
(Ali, Alhajlah, & Kassem, The status and future trends in Systematic literature review The complexity of the project, 
2022) building information modelling (SLR) methods through co- integration of components and 
(BIM) occurrence and co-citation data 
analysis. 
(Banfi, Brumana, Landi, Data categorising in the digital HBIM model Protection and longevity of 
et al., 2022; Khalil et al, documentation of heritage heritage buildings data. 
2021) buildings Interoperability and BIM 
standardisation or extended 
reality environments and data 
formats. 
(Cardinali et al., 2023; Digital twins for heritage Historic building information Data acquisition constraints and 
Moyano, Carreño, Nieto- buildings. modelling (HBIM) geometric difficulties, the scale of the 


Julián, Gil-Arizónņ, & 
Bruno, 2022; Noronha 
Pinto de Oliveira e Sousa 
& Correa, 2023). 


model. 


building. 


(Daniotti et al., 2022). BIM-DT Based Interoperability | BIM-DT based model Data integration and 
for efficient renovation in standardisation processes 
buildings 

(Pan, Braun, Brilakis, & | How to enrich geometric digital 3D point cloud + Al-based Variation in objects, data 


Borrmann, 2022) 


twins of buildings, particularly 
emphasis on capturing small 
vital entities in buildings 


image segmentation 


collection and occlusion of the 
environment 


(Porsani, de Lersundi, 
Gutiérrez, & Bandera, 
2021) 


Evaluates of an automated or 
semi-automated BIM to BEM 
workflow in buildings 


BIM-BEM model 


Technical expertise, large and 
complex buildings 


(Sagarna et al., 2022) 


Documentation and displaying 
inspection-related information 
in BIM models to generate a 
dynamic information model in 
buildings. 


Innovative BIM model-State of 
Conservation Assessment BIM 
Model (SCABIM) 


Dynamic open-source data 
integration, technical 
workforce, and equipment 


(Tan, Leng, Zeng, Feng, & 
Yu, 2022). 


DT-3D digital replicas record 
and heritage buildings survey 


BIM 
approach, 


multi-methodological 


Obstruction of the surrounding 
environment and data 
exchange. 
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3.4 


SECTION F - DIGITAL TWIN 


DT for Net Zero in Existing Buildings (NZEB) 


Table 3 includes related articles on efforts towards NZEB and the challenges faced while adopting DTs. BIM- 
based models were majorly used to integrate data for simulations. 


Table 3. Articles on NZEB research using DTs. 


Author/Year Focus of Study Employed Tools Challenges /Limitations 
(Godager, Onstein, & BIM in asset and facilities Enterprise BIM (EBIM) Data integration, security and 
Huang, 2021) management. management, Naming 

convention, Data 

infrastructures, and suitable 

common data environment. 
(Kaewunruen & Xu, 2018) Assessing the carbon BIM-enabled data visualisation Computational and technical 


footprints of a building in the 
design process using BIM 
technology. 


+ API 


expertise, natural light area and 
data extraction 


(Kaewunruen, Peng, & Sustainability and DT+BIM Computational and technical 
Phil-Ebosie, 2020; vulnerability in buildings expertise, structural tolerances, 
Kaewunruen, and autonomous integration. 
Sresakoolchai, & 

Kerinnonta, 2019) 

(Lu, Xie, Parlikad, BIM to digital twins for Smart asset management (DTs Technology, Organisation, 
Schooling, & operation and maintenance + AI + ML+ data analytics) information, and data standard- 
Konstantinou, 2020) related issues. Domain 


alignment, Interaction between 
domain and all stakeholders 


(Ochs, Franzoi, Monitoring and simulation- MATLAB Simulink simulation Cost analysis, different heat 
Dermentzis, Monteleone, based optimisation of two +Building management system sources, and simulation 

& Magni, 2023) multi-apartment for NZEBs 

(Shen, Ding, & Wang, Whole-life-cycle net-zero- DT+BIM Building lifecycle stages. 
2022) carbon buildings Automated system and dynamic 


sensor data 


3.5 


DT for Facility Management 


Related articles on building facility management and the challenges of adopting DT approaches are in Table 4. 


Studies were mainly based on existing buildings' operation and maintenance phases. 


Table 4. Articles on facility management research using DT. 


Author/Year Focus of Study Employed Tools Challenges /Limitations 
(Badenko et al., 2021) Integration of digital twin and DT +BIM (“Factories of the Systems integration, Training 
BIM technologies in FM Future” framework) and investment cost and 
technical expertise 
(Chacón et al., 2023) Structural Health Monitoring BIM -DT-based model Data integration and 


(SHM) systems within the DT 
platform 


interoperability, and multiple 
sources pipelines. 


(Chen et al., 2021) 


An innovative maturity model 
for measuring digital twin 
maturity for asset management. 


DT +BIM (Gemini Principles) 


Domain environment, technical 
expertise, Cultural and policy 
influence 


(Costa, Arroyo, Rueda, & 
Briones, 2023) 


A ventilation early warning 
system (VEWS) for FM 


Smart Campus Digital Twin 
(SCDT) framework (BIM +IoT 
+AD) 


Characteristics of building 
workspaces, data integration 
and interoperability of systems 


(Fialho et al., 2022) Prototyping a BIM and IoT- BIM and IoT-based Lack of consistent tools, 
based smart lighting methods, and devices for 
maintenance system for the FM measuring building 
sector components' performance and 

restrictions in research 
resources 

(Harode et al., 2023) System architecture fora digital Structured literature search and Data integration, standard, 
twin in a healthcare facility analysis +DT environment, interoperability, 


(FM) and technical expertise 
(H. H. Hosamo, Nielsen, Automated fault source DT + BIM model based on IoT connectivity, BIM systems, 
Kraniotis, Svennevig, & detection and prediction for Bayesian networks (BNs). technical and computational 
Svidt, 2023a, 2023b) comfort performance expertise, interoperability of 


evaluation of existing buildings 


systems, occupant profile and 
owner funding. 


(Jiao et al., 2023) 


A sustainable digital twin (DT) 
model of operation and 
maintenance for building 
infrastructures, 


DT +Bayesian network (BN) + 
Random Forest (RF) 


Small buildings, equipment, 
building space, data collection 
and systems 
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(Khajavi, Tetik, Liu, DT for safety and Security in DT+BIM Awareness, cost, data type, real- 

Korhonen, & Holmstrom, building Lifecycle time studies, and access to case 

2023) studies on other buildings 

(Levine & Spencer, 2022) — Post-Earthquake Building BIM-Based Digital Twin Building components and 
Evaluation. Framework alignment, BIM systems and 

computational systems 
(Lu, Xie, Parlikad, &  Visualised inspection system AR-supported automated Environmental conditions, Data 
Schooling, 2020; Xie, Lu, for monitoring during daily environmental anomaly collection, processes, and 


Rodenas-Herraiz, 
Parlikad, & Schooling, 


Operation and Maintenance 


detection and fault tree analysis 
method 


computational expertise 


2020) 

(Moretti, Ellul, Re Integration of GIS and BIM for DT +BIM +GIS (Geographic Technical complexity, digital 
Cecconi, Papapesios, & built environment condition Information System) expertise, automatic semantic 
Dejaco, 2021) assessment in asset mapping, stakeholders, 


management decision-making 


information gathering, data and 


systems integrations 


(Pan, Braun, Borrmann, & 
Brilakis, 2022) 


How to generate geometric 
digital twins of the indoor 


3D deep-learning-enhanced 
void-growing approach 


Computational efforts, point 
cloud data and occlusion of the 


environment of buildings environment 

automatically. 
(Rampini & Re Cecconi, Artificial intelligence in Literature review +bibliometric Time-consuming and labour- 
2022) construction asset management analysis intensive, technical expertise, 

for sustainability Data collection and processes 
(Shahinmoghadam, The benefits of BIM, the IoT BIM+ IoT +VR Sensor placement, Building 
Natephra, & Motamedi, and Virtual Reality (VR) for space, and semi-automated 
2021) thermal comfort conditions registration of the thermal 


image’s method 


3.6 


DT for Energy Management 


Articles focusing on the efficiency of building energy management and the challenges of adopting DT approaches 
are in Table 5, exploring energy cost and consumption reduction. 


Table 5. Articles on energy management research using DT. 


Author/Year Focus of Study Employed Tools Challenges /Limitations 
(Borja-Conde, Automatic thermal models of High-fidelity simulator ML techniques, the accuracy 
Witheephanich, existing buildings for energy software TRNSYS and robustness of building 
Coronel, & Limon, management thermal behaviour modelling 
2023). 

(Corrado, DeLong, Green metrics and digital twins Published literature review Real-world metrics standard, 

Holt, Hua, & Tolk, for Sustainability planning and computational decision support 

2022). governance 

(Delgado et al., 2023) The interconnection between BIM BIM-BEM (Drone based) Technical expertise and 
and building energy modelling framework occlusion of the environment 
(BEM) for energy cost reduction. 

(H. Hosamo et al, A digital twin of heating, BIM +DT (MATLAB + Data standards and type, 

2023). ventilation, and air conditioning Artificial neural network + Ontology techniques to 
for optimisation of energy multi-objective genetic integrate BIM, energy 
consumption and thermal comfort algorithm) management, and thermal 
based. comfort data in one framework. 

(Hosseinihaghighi etal., Assessment of housing stock and Smart thermostat integration Data quality and collection, 

2022) smart thermostat data in support model Classification standard 
of energy end-use mapping and terminology, occupants’ 
housing retrofit program planning behaviours, Building 

characteristics, real-time data 
integration and interoperability 

(Kaewunruen, Reconstruction design of an  BIM-based digital twin Data integration, financial 

Sresakoolchai, et al., existing building energy building analysis, different functions of 

2019) goal renewable technologies, and 

computational supports. 

(Lamagna, Groppi, Digital twins for smart energy A comprehensive literature Big data, communication 

Nezhad, & Piras, 2021). | management system review protocols, lack of regulation and 

a transparent market, uncertain 
and unclear framework, 
different versions of DTs and 
algorithms. 

(Spudys et al., 2023). Operational energy performance DT model Lack of required equipment, 
of buildings with the use of digital digital environment, 
twins infrastructure, occupant 

behaviour and building 


automation systems 
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(Tang et al., 2023). Vertical greenery system (VGS) DT + BIM model Direction and location of the 


renovation for building energy building and environmental 
efficiency analysis based on conditions 
digital twin 

(Zhao et al., 2022). Evaluation of public toilet BIM-based digital twin Building cost and construction 
ventilation design schemes simulations computations, 
through a digital twin to maintain cloud-based services, 
high environmental quality computations support 


4. RESEARCH SIGNIFICANCE AND CONTRIBUTION 


The research's significances are to remove silos and allow data and information sharing for effective and holistic 
implementation and interactions that reduce wastage and costs in retrofitting and energy consumption in existing 
buildings as one of the largest energy consumptions. The research would add insights into energy retrofit and 
management cost as part of Net-zero strategies in the UK. 


Analysing the existing literature on DT attributes and challenges provides a glimpse into the reality of insufficient 
research studies on DT globally, especially in the UK; 25 articles from the UK were within the research scope. 
The analysis would contribute to the body of knowledge in the UK and globally. The literature pointed out the 
confusion of DT as an extended branch of BIM. However, DT has been revealed as a multi-objective and multi- 
scale technology field within the 14.0 that requires an advanced analytical approach through data and stimulation 
to support comprehensive and predictive decision-making to save costs and wastages. The door is widened for 
researchers to explore viable solutions to overcome the challenges uncovered in the literature to leverage DT 
benefits within the built environment. 


The research would contribute and benefit suppliers (technologies and platforms), homeowners, landlords, 
policymakers, operation and maintenance (O&M) managers, facilities management (FM), and the research 
community to reduce costs with predictive decisions based on proactive maintenance rather than reactive 
maintenance. It will replace speculative retrofitting actions with factual insights for different types of existing 
structural buildings, occupancy behaviour and locality. There are reasonable limitations within the period of the 
systematic review and the platform used to select optimal articles based on the research topic due to the low 
number of publications. However, more publications could have been added from other platforms like Google 
Scholar but not peer-reviewed to ascertain the quality required to maximise the value of the review. 


5. CONCLUSIONS AND FURTHER RESEARCH 


Digital Twins can integrate information from multiple sources to replicate the operation of the physical spatial 
asset in real-time for stimulations and predictive insights. The application of digital twins is still in an emergency 
phase as a valuable digital technologies advancement, especially with the built environment and the AEC industry 
in the UK. There needed to be more publications on the research topic. The paper set out two objectives: 1) to 
explore the challenges in using DT in existing buildings. And 2) to investigate how the built environment sector 
could use predictive maintenance system-based DT in the UK. 


DT needs to be more understood and consistent in the application or technical tools usage from literature. In 
addition, practitioners, AEC managers and academics need more proper knowledge and technical expertise on 
digital twins as part of the 14.0, as evidence from the literature resulted in low empirical case studies. The 
complexity of real-time data integration and interoperability were highlighted as part of the challenges. Digital 
Technologies can be pardoned as a common word for 14.0, are the enabler of all societies and organisations’ 
productivities, previously a buzz, now overtaken by Artificial Intelligence (AI). The direction to advance digital 
twins’ adoptions in existing buildings towards achieving net-zero goals is realism 1). Training for applicable 
practitioners in the built environment for the AEC industry. 2). Affordable computations and simulation tools and 
systems 3). Cloud-based architecture infrastructure and services 4). Data integration and interoperability 5). Open 
sources collaboration 6). Socio-technical influences and sustainable policies. 


Finally, based on the literature, predictive maintenance system-based DT with intelligent analysis supported 
comprehensive and consistent solutions for rehabilitation, operation, and maintenance management of existing 
building life cycle management. However, more steps are needed to leverage the benefits: the future case study 
for the built environment sector to use predictive maintenance system-based DT in the UK are 1). the depth of 
knowledge of DT with professionals in the built environments, 2). A standard benchmark framework for DT 
adoptions 3). Available and accessible computations simulation tools, and systems 4). Funding for the research 
5). Socio-Technical influences and training. 
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ABSTRACT: In supporting the economic growth, Indonesian government has instructed to develop 201 National 
Strategic Infrastructure Projects, including Ameroro Dam Project. Located in Southeast Sulawesi, the construction 
process faced many engineering challenges with conventional monitoring methods, such as potentially delayed 
action plan and hindered decision making due to insufficient progress visualization data, inadequate real-time 
monitoring data, and unintegrated engineering data. Therefore, Project Management Information System (PMIS) 
dashboard is utilized as a Digital Twin innovation to overcome these challenges and optimize the project delivery. 
This study presents a case study approach on how PMIS could optimize the progress monitoring in Ameroro Dam 
Project. This PMIS Dashboard is integrated with Building Information Modelling, Digital Survey, Geospatial Data, 
and Project Management Data that supports the decision making as it provides more reliable data. This study 
illustrates the comparative study between conventional method and PMIS efficiency for a better project 
management. The effectiveness of PMIS can be seen as the integrated data is utilized to plan a construction 
working methods, along with monitoring the project schedule. Moreover, the visualization helps the engineers for 
a risk mitigation with the project performance display. Eventually, the paper concludes by the PMIS dashboard 
optimization for real-time progress monitoring in dam project, leading to more efficient infrastructure construction 
project management. 


KEYWORDS: BIM, Digital Twin, Construction Working Methods, Geospatial Data, Progress Monitoring, Project 
Management. 


1. INTRODUCTION 


As an agricultural country, the presence of Dam has always been considered as one of the most important 
infrastructures in Indonesia. As one of the most crucial engineering infrastructure, Dam is primarily used for water 
supply, flood control, agricultural irrigation, and hydroelectric power generation (Kalkan, 2014). Despite its many 
functions, the construction of Dam project is considered very complex. The construction of Dam project involves 
a large number of people with various objectives, interest, and disciplines to perform interdisciplinary activities. 
Moreover, time and physical resources limitations have added another dimension of complexity for dam project 
(Mahato & Ogunlana, 2011). However, many construction projects in Indonesia still apply traditional methods of 
project management which frequently caused a low productivity and waste a significant amount of time and 
resources due to poor prioritization and poor multi-tasking. Not only in Indonesia, but globally, construction is one 
of the biggest industry in the world that matters for the world economy. However, it has a long record of poor 
productivity. McKinsey (2020) reports the construction industry represents 13% of global GDP, but the 
productivity growth in construction only reach 1% annually for the past two decades. As shown in Figure 1, is 
significantly less than the productivity growth of the global economy, approximately 2.8% a year. 


The low productivity in construction caused by many factors, one of which in slow innovation and digitalization. 
As it can be seen on Figure 2, the digitization index in construction is ranked as the lowest 2 among other industry. 
Currently, many construction projects are still using traditional monitoring systems which is impossible to 
implement timely checks and repairs (Woo, 2010). Previous studies also identified several limitations of 
conventional monitoring methods such as the absence of project visualizations that can hinder the decision making, 
affected the project quality, time, and costs. 
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Figure 1. Global Labor Productivity Data (McKinsey Global Institute, 2017) 
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Figure 2. Industry Digitization Index (McKinsey Global Institute, 2016) 


As the construction projects are getting more complex, it needs more advanced and integrated tools to improve the 
construction process. Along with the advancement of information technology (IT) that have been rapidly evolving, 
Digital Twin and Building Information Modelling (BIM) are some of the latest technologies that are useful in terms 
of improving project management in construction. Digital Twin can be referred as a concept originated with the 
Internet of Things (IoT) that represents digital simulation within an IT platform by integrating physical feedback 
data, artificial intelligence (AI), and machine learning (Bi, 2022). Furthermore, Digital Twin contributes to smart 
construction in achieving economic and sustainability goals (Istanbullu, Wamuziri, & Siddique, 2022). As it 
improves the productivity in construction sector through digitalization, many studies discussed about the 
implementation of Digital Twin and BIM in the construction project. However, the least amount of research 
discussed the best practices of Digital Twin utilization for infrastructure project management, specifically dam 
project. Therefore, this paper aims to perform a digital construction innovation namely Project Management 
Information System (PMIS) dashboard as a Digital Twin platform and analyze the project management 
effectiveness based on a real case study of a dam project. Finally, some recommendations of PMIS future 
development are proposed in the conclusion for future infrastructure projects. 


2. LITERATURE REVIEW 
2.1 Project Management 


In order to success in delivering the expected project delivery, a good practice of project management should 
enable organizations to execute the project effectively and efficiently. Project management can be described as the 
implementation of knowledge, skills, tools, and techniques to project activities to meet the project requirements. 
Project management is accomplished through the correct execution and integration of the project management 
process identified for the project (Project Management Institute, 2017). Especially for construction project which 
involves many aspects, a good project management practice is required to ensure the project can be delivered on 
time. Moreover, construction project management can be described as a multidimensional discipline that requires 
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accurate consideration of various crucial aspects, including cost, quality, schedule requirements, as well as social 
and environmental impacts, and broader stakeholder interests (Ke, Zhang, & Philbin, 2023). 


2.2 Project Monitoring 


To achieve the project goals, there are several crucial things in practicing project management, one of which is 
monitoring. Especially for dams construction, monitoring is considered critical since deformation may occurred 
as a result of erosion, water load, hydraulic gradients, and water saturation (Kalkan, 2014). Besides, monitoring 
can be described as collecting, calculating, assessing measurement and trends to enhance the process 
improvements (Project Management Institute, 2017). It can be concluded that project monitoring is fundamental 
aspect for successful project management and decision-making. However, despite the importance of project 
monitoring, many construction projects currently constrained by the inadequacy of traditional monitoring system. 
Traditional monitoring methods are considered inaccurate, time-consuming, and labor-intensive because they rely 
on large-scale manual operations that will lead into project delays and cost overruns (Nakanishi, Kaneta, & Nishino, 
2021). Therefore, a good project monitoring system in a complex construction project is considerably important. 
For instance, continual monitoring is required to give the project management team with insight into the project’s 
health and indicates any areas that might require special attention (Project Management Institute, 2017). It is 
because successful construction projects are identified by the level of awareness of project progress or work 
performance. Moreover, a progress monitoring system should comply the information requirement for real-time 
progress and decision making (Teizer, Lao, & Sofer, 2007). Ideally, monitoring may fasten the decision makings 
with its reliable data. The monitoring activities process should enable stakeholders acquire a comprehensive 
overview of the project’s existing condition, identify issues for corrective action, and estimate future performance 
in terms of time and cost (Project Management Institute, 2017). As numbers of research have been conducted to 
reduce the gap between traditional monitoring and real-time monitoring, several approaches have been proposed 
for a better project management in construction monitoring. 


2.3 Project Management Information System 


Semi or fully automated data collection and analysis process can be a support in making quick and accurate 
decisions. Fortunately, technological advancement has been striving to improve the construction monitoring 
system in recent years. Numerous emerging automated data collection, analysis, and visualization techniques also 
have been utilized to develop systems for digitized real-time progress monitoring (Nakanishi, Kaneta, & Nishino, 
2021). Thus, the implementation of PMIS can be considered effective to improve the project management in a 
complex project. According to Project Management Institute (2017) PMIS required to collect, analyze, and use the 
information to accomplish the project objectives and realize the project benefits. Moreover, PMIS also provides 
access to Information Technology (IT) software tools such as scheduling software, that allows stakeholders to track 
planned dates versus actual dates, report variation between the progress performance against the schedule baseline 
and estimate the effects of modifications to the project schedule model. Beside the integration between the volume 
of data and information, PMIS also includes scheduling software that has the capability to help plan, organize, and 
adjust the sequence of the activities; it also expedites the process of building a schedule model by generating start 
and finish dates based on the inputs of activities, network diagrams, resources, and activity durations (Project 
Management Institute, 2017). 


2.4 BIM and Digital Twin 


As it can fulfill the aspects of the PMIS, the idea of collaborating BIM and GIS shows a great potential as a Digital 
Twin to improve the project management process. In the last few years, BIM known as a collaborative working 
method for the creation and management of a construction project (Acebes, Testa, Alonso, & Curto, 2023). BIM 
can be defined as a three-dimensional view that represents the building data to achieve the summary and integration 
of various information in construction (Yue, 2023). Furthermore, project progress cost, construction clashes and 
situation can be accessed through the animation in BIM. These functions are in line with the PMIS which include 
spreadsheets, simulation software, and statistical analysis tools to assist with cost estimating (Project Management 
Institute, 2017). However, the utilization of BIM only could not satisfy the PMIS objectives. While BIM tools 
provide excellent representations for product design, they often lack essential features necessary for construction 
when it comes to Digital Twins (Bao, Guo, Li, & Zhang, 2018). Digital Twin can be defined as a virtual 
representation of a physical asset that can reflect its current status. The data is collected through sensors and other 
monitoring devices embedded within the structure and transmitted to the virtual model in real-time (Ma, et al., 
2020). The implementation of Digital twins ensures the final result to be more accurate and reliable. Through 
accessing to this real-time data, stakeholders can quickly identify and respond to issues as they arise, leading to 
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increased efficiency and productivity (Jiang, Guo, & Wang, 2021). In this case, BIM can be utilized for visual 
management tools as the base information for Digital Twin to provide a high-level representation of buildings and 
their assets by integrating the physical and digital world (Hosamo, et al., 2022). Besides, Digital twin construction 
creates a data-centric way of construction management by combining BIM technology, lean construction thinking, 
the digital twin concept, and artificial intelligence (Bandara, Ranadewa, Parameswaran, & Eranga, 2023). Hence, 
the availability of current and historical data on the digital twin enables predictions of future behavior which 
beneficial for operation and maintenance phase from the infrastructure asset (Sivalingham, Sepuvelda, Spring, & 
Davies, 2018). Some technologies such as GIS and digital surveying are also utilized to obtain the real-time data 
of Digital Twin. Moreover, the presence of PMIS as digital twin may improves the overall project management 
performance as it displays the visualization of current project state, detect the potential risk, shows the project 
performance, as well as enhance the collaboration between stakeholders. However, as many observers addressed 
about the development of BIM and Digital Twin, none of the observers discussed about the best practices for dam 
project. This perspective is important since dam construction is considered very complex and required a good 
project management system. Hence, the proposed research is to identify the Digital Twin best practice for dam 
project, as well as analyzing the effectiveness for project management. 


3. METHODOLOGY 


The methodology used on this research is based on a case study approach of Ameroro Dam from the perspective 
as the general contractor for project management purposes during the construction. The obstacle inherent in the 
construction phase is that the monitoring process is still conventional, resulting the data obtained not being real- 
time, although this phase is critical for monitoring project performance to ensure quality, budget, and schedule. 
Therefore, the step to overcome this limitation is by doing digitalization with PMIS. The PMIS will automatically 
collect and process the data such as BIM Model, Digital Survey data, and project data for the input data. As the 
result, the PMIS platform will present the project data in a more effective way, especially to support the project 
management and decision-makings. 


3.1 Case study description 


As an agricultural country, the agricultural sector plays a crucial role and contributes to the country’s economy. 
According to data from Central Bureau of Statistics, Indonesia produces 31,36 million rices in 2021, and the 
government plans to increase the rice production targets up to 300.000 tons in 2024. Therefore, to support this 
policy, the Ministry of Public Works mandated to construct 65 priority dams through the National Strategic Project, 
one of which is Ameroro Dam Project. The Ministry of Public Works assigned the construction of this project to 
Hutama Karya as one of the Indonesia’s leading state-owned enterprises for the main contractor. Ameroro Dam is 
built as the 2"¢ dam in South East Sulawesi Province to increase the number of water reservoirs in Indonesia in 
order to support food security programs and water availability. This dam was designed to be a multipurpose dam, 
with a total capacity of 54.15 million m° and an inundation area of 212.89 Ha. With the project value of 38 million 
USD, Hutama Karya covers the scope of works namely spillway, access road, bridges, hydromechanical and 
electrical, and supporting and facility buildings. With a project duration of 945 days, Hutama Karya faces a 
complex project management to finish Ameroro Dam Project such as targeted to accelerate from the project 
schedule, remote access to the project, and managing communication with stakeholders in remote area. Therefore, 
an effective monitoring system is required by the organization to manage this project throughout the project life 
cycle. The required features to support the project monitoring are collected and combined into a platform to form 
the PMIS dashboard. The utilization of PMIS dashboard helps Hutama Karya as the main contractor to enhance 
the project management during the construction process. The PMIS is based on Building Information Modelling 
(BIM) and Geographic Information System (GIS) and is integrated with other company’s strategic platforms, 
which supports the data requirements for the project to provide a more reliable data, especially during the decision- 
making process. 


Figure 3. Ameroro Dam Project 
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3.2 Project management information system 


The existence of PMIS has significantly improved project management by delivering a more accurate construction 
process and a better project visualization for construction monitoring. The features available on the PMIS 
dashboard does not have any specific benchmarks and are developed according to the company's business process 
needs. The framework in Figure 4 shows the workflow from data collection to processing and output production 
of PMIS. The primary data such as BIM Model, Digital Survey Data, and Project Data is utilized as the foundation 
of PMIS dashboard which in this case study is built in the base of a GIS platform, thus geospatial map is becoming 
the base layer of the information. The production of BIM Model is required and processed with georeferenced 
coordinate system, inserted to be a raw data for Geo-BIM feature. The Geo-BIM referred to in this paper is the 
integration of the BIM Model with GIS, hence, to simplify further discussion it will be referred as Geo-BIM. On 
the other hand, Digital Survey obtained with Photogrammetry methods by using Drone Mapping. This digital 
surveying operation will produce point clouds and processed into Digital Terrain Model (DTM), Digital Surface 
Model (DSM), and Orthophoto as a raw data for Geo-BIM. Afterwards, the superimposed BIM Model and 
Orthophoto in the GIS platform will be beneficial as the progress visualization feature for project monitoring. 
Moreover, this PMIS dashboard is also equipped with video surveillance from live CCTV that will support the 
stakeholders for doing construction monitoring remotely. For a better understanding on how PMIS works for the 
project management process, it will be explained in Figure 4 as shown below. 
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Figure 4. PMIS Flowchart 
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3.3 Data Collection 


As a basis input for PMIS dashboard, data collection is undertaken using two methods namely manual methods 
with total station and 2-dimensional engineering drawings and digital methods by utilizing digital survey, BIM, 
and project data. The digital data collection includes Digital Surveying process to obtain the geographic data, real- 
time data, and making the BIM Model for the visualization and calculation needs. 


3.3.1 Manual Data Collection 


The conventional method of surveying is undertaken by surveyor using a Total Station Topcon GTS 230 and it 
took approximately 1,5 hours for 1 hectare. The manual surveying is undertaken by 2-3 surveying engineers. 
Furthermore, the 2-dimensional engineering drawing usually made at least 1 day for each drawing as shown on 
Figure 5. 


bolt) ey \ i 


Figure 5. Manual Data Collection with Total Station and 2D Engineering Drawing 
3.3.2  PMIS Data Collection 


3.3.2.1. Digital Survey 


At the earliest stage of the construction, Digital Survey was conducted by BIM Engineer to obtain geographic data 
to be superimposed with the engineering drawings. The digital surveying process is carried out with 
Photogrammetry method by using Drone Mapping that will perform aerial surveying and generate a collection of 
images of the object. The raw data was collected using a drone DJI Phantom 4 Pro V.20 shown in Figure 6. The 
surveying methods was started by arranging the flying route of the drone and setting up the overlapping rate 
between each image. Furthermore, Digital Survey method considered effective as it took only 15 minutes 
approximately for the area of 15 hectares per flight mission with 100 meters of altitude and produced Digital 
Elevation Model and Orthophotos as the output. This method is conducted regularly, and the data will be embedded 
on the PMIS dashboard for project monitoring needs. 


Figure 7. Photogrammetry Flight Mission 


3.3.2.2. Building Information Modelling 
During the construction process, Building Information Modelling (BIM) plays an important role as the project 


visualization. Hence, the BIM Model is an important input data for the PMIS dashboard, which can be later 
integrated with GIS. The data collection process started with the BIM 3D Model creation based on the detailed 
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engineering drawings using BIM Authoring tools. Afterwards, the coordinate points from the digital survey are 
inputted on the BIM 3D Model, so that this model has georeferenced data that can be integrated with GIS later on 
the PMIS Dashboard. The process for generating shop drawings from BIM 3D model only requires 4 hours to 
finish each drawing. This 3D model is also required to provide the design overview for stakeholders, as well as 
being the basis for decision-making in solving any clash problems. 
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3.3.2.3. Project Data 


Besides, the PMIS dashboard also integrated with project data such which compiles stakeholders name, project 
milestones date, location, and project remaining time to be showed in the project information. Moreover, the PMIS 
dashboard also integrated with project engineering data such as master schedule, monitoring schedule, 4D 
scheduling from BIM, plan and realization of project cost that becoming the focal point of the dashboard which 
will be featured as the project performance. 


3.4. Data Processing 
3.4.2. Manual Data Processing 


For the surveying phase, Total Station obtained RAW data such as distance and elevation level, and the output will 
be inputted manually on Microsoft Excel and processed in CAD software. Moreover, for the schedule data, the 
conventional method needs to input manually on Microsoft Excel or Microsoft Project for the monitoring purpose. 
This process usually requires 6 hours long to produce Cross Section data as a support for Project Monitoring. 
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Figure 10. Project Master Schedule 
3.4.3. PMIS Data Processing 


The graphic visualizations on this dashboard are supported by collaborating platforms and software which data 
kept in cloud-based or on-premises server. In this case, the company use both, the combination between cloud- 
based and on premises server because the data is interlinked. Nowadays, digital platforms are build based on 
collaboration which helps them to grow stronger with the ability on showing other platforms data through open 
API and interlinked data. As well in this PMIS dashboard, beside using open API to connect the BIM model data, 
the connector between platforms is developed in spreadsheets that previously arranged with required fields based 
on each visual requirement. Thus, the information inserted is categorically and historically inputted regarding the 
visual representation needed in the dashboard then transformed into bar graphs, doughnut graphs or gauge charts. 
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3.4.4. Project Information 


The project information as the basic data, compiles stakeholders name, project milestones date, location, weather 
condition, and remaining time that becoming the focal point of the dashboard. In the framework of project 
management, the remaining time shows in countdown to ensure the initial attention of the stakeholders are 
concerning on how the time and progress are in the right direction, otherwise the rest of information provided in 
this dashboard will be able to support the necessary decision making to guarantee the timely delivery of the project. 


Figure 11. Project Information Data Processing 


3.4.5. Project Performance 


Project Performance is used to determine if a project is in the Excellent, Warning, or Critical category based on 
the actual and anticipated project cost value, income realization, and plan. The Project Performance contains many 
information and integrated with the project engineering data such as master schedule, monitoring schedule, plan, 
and realization of project cost. Firstly, these databases are inputted in Microsoft Excel and processed on ArcGIS 
Pro. Then the dashboard is created on ArcGIS Dashboard Enterprise, and these S curve dashboards are inputted in 
PMIS dashboard through the ArcGIS Experience Builder. 


Figure 12. Project Performance Data Processing 
3.4.6. Geo-BIM 


During the data processing, the Digital Survey data which obtained from Drone Mapping will be processed to be 
superimposed with the BIM 3D Model. The output produced from the drone mapping were subsequently translated 
into Orthophoto and Digital Elevation Model (DEM) to be processed in Agisoft metashape. Afterwards, DEM can 
produce 2 outputs, namely Digital Terrain Modelling (DTM) and Digital Surface Modelling (DSM). DTM could 
be processed as a contour and can be developed as cross section and top surface in Autocad Civil 3D to produce 
cut and fill volume calculation for progress monitoring. Meanwhile the DSM or Digital Survey Modelling will be 
processed as 3D Object. The 3D object will be inputted in ArcGIS and overlayed with the BIM Model to see the 
construction progress, as shown on Figure 13 and 14. 


Figure 13. Orthophoto Data Processing 
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3.5. Result and Discussion 


After the data processing, many outputs showcased as the result of PMIS. The final output contains project 
performance, Geo-BIM, and Live CCTV which beneficial as the progress visualization and enhance the decision- 
makings during the construction progress. This dashboard usually used as project management plan by project 
stakeholders namely project manager, site engineer manager, engineers, and head-office management team to 
identify the current project status and regular monitoring purposes during the coordination meeting. 


3.5.2. Project Performance 


The main page of PMIS dashboard allows us to identify the current project status and project performance. It 
includes the project information such as project value and time of completion, project remaining time, project 
location, and the weather. The project performance feature also integrated with the project master schedule, 
showcasing the S Curve of the project progress. Moreover, it completed with the Schedule Performance Index 
(SPI), Cost Performance Index (CPI), and Efficiency Performance Index (EPI) which will resulting as the project 
performance conclusion, as shown on Table 1. Moreover, it also completed with the project data such as financial 
data, QHSSE data, Issue log and action plan. 


Table 1. Project Performance Formula 


SPI EPI CPI SPI = Schedule Performance Index 
>l >] >l EPI = Efficiency Performance Index 
>] <1 <] CPI = Cost Performance Index 


This formula is referred as Project Forecasting, that is utilized as a monitoring tool that ease the project 
management team to identify risks and take actions, as well as allocate the resources early. The result can be seen 
on the PMIS main dashboard which aims to increase the project team’s awareness of current project status as 
shown on Figure 15. 


Figure 16. Ameroro Dam Project Performance on PMIS Dashboard 
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3.5.3. Progress Visualization 


The aerial images produced from the Drone Mapping are shown in Progress Visualization feature as an overview 
of the construction progress. This feature is beneficial for regular monitoring purposes to ensure many points on 
the site that are difficult to reach manually, so that if there are any construction issues such as initial cracks occur, 
it can be detected and anticipated immediately. 


Figure 17. Progress Visualization 
3.5.4. Progress Monitoring with Geo-BIM 


As aresult of the integration between BIM 3D Model and digital data, Geo-BIM was obtained as one of the features 
of PMIS. In addition, the results from digital survey namely geospatial data, orthophoto, and 3D objects can be 
superimposed with the BIM Models and site progress data to be integrated in PMIS as digital twin to the existing 
conformance to the plan accurately with the appropriate coordinates as shown in Figure 18. Moreover, this feature 
also enables us to make a comparison between the planned construction action plan on the current month versus 
the realization of construction action plan, which both visualized in BIM overlayed with orthophoto. Manual 
methods monitoring can only use ms project in terms of scheduling and cannot provide overlay visualization data. 
Meanwhile, the progress monitoring using PMIS provides Geo-BIM that combined with the project schedule, that 
can be created for weekly action plan. Each realization progress will be inputted into the Geo-BIM as overlayed 
data. Then the result of this sumperimposed data can be compared for regular monthly monitoring to identify the 
deviations. Therefore, if there is any deviations, a strategy can be made to catch up the project schedule. Hence, 
the integration between photogrammetry with BIM Model accelerates the construction monitoring process as well 
as making the action plan with the help of the digital twin as the project visualization. Thus, this integration assists 
stakeholders on decision makings more accurately and quickly. 


Figure 18. Progress Monitoring with Geo-BIM 


3.5.5. Live CCTV 


There is a Live CCTV feature which integrated with site project cctv support the progress monitoring purposes. 
Moreover, this Live CCTV feature enable the stakeholders to do real-time monitoring remotely during the 
construction process. The CCTV units are strategically placed in the production and construction areas, showing 
not only the current activities but also assisting safety manager to ensure accident prevention with real time 
monitoring from the control room. 
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Figure 19. Live CCTV 


3.5.6. Comparative Analysis 


As we can see from the data above, the manual methods of project monitoring require a lot of time and manhour, 
rather than the digital monitoring through the PMIS. The comparative analysis below is affected by many factors. 
On the quantitative side, it affected by manpower and manhour needed during the data collection and processing, 
with formula as illustrated below: 


Data Collection Time Productivity Rate for every 1 hectare 


>» Manual Survey time (min) 


Productivity Rate = 
a al pans Survey time (min) 
Productivity Rat Le 60 

= |———| = x 
roductivity Rate 5 min 


Data Collection Manpower Efficiency 


_ X Manual — Digital Survey manpower (people) 
Ef ficiency Rate = =; m cr] 
> Manual Survey manpower (people) 
Efficiency Rate = |E Z2 P&OPIES| . 100% = 33% 
f ficiency Rate = 3 peoples x b = 0 


Data Processing Time Efficiency 


A >» Ms Project + BIM (hours) 
Efficiency Rate = Ms Project (hours) _ x 100% 
T 8 hours 
Efficiency Rate = IS hours|* 100% = 50% 


Data Processing Productivity Rate 


ae È 2D Drawings time (hours) 
Pee Ea >» BIM Drawings time (hours) 
= _ |8hours| _ 
Productivity Rate = Z hours) ~ 


On the other side, the utilization of PMIS also shows significant difference in terms of output quality, especially 
to support the project monitoring and decision-makings as shown on table 2. 
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Table 2. Comparative Result of Project Management between using PMIS and non-PMIS 


Factors Project Management without PMIS Project Management using PMIS 
Data Consisted of single-sourced data. The process requires | Consisted of integrated data, which more comprehensive 
Collection longer time to collect data from many involved parties | between time and visualization. The data automatically 
updated by the Engineer with a faster time. 

Project Manual monitoring requires longer time and cannot be | Project Monitoring with PMIS allows the integration between 

Monitoring schedule-based since the realization progress from the | GIS, BIM, and Project Schedule to be superimposed into 
surveying data is not integrated. monthly overlayed data for regular monitoring purpose 

Building It is unintegrated with the geospatial data so clash | It can be overlayed with the digital survey data to see the 

Information detection with the site cannot be checked conformance with the existing site as well as beneficial for 

Modelling progress monitoring 

Project The management team in project or headquarter may be | Increase the management team awareness of project current 

Performance unaware with project current status by the | status and beneficial to make a corrective action plan 
unavailability of updated project performance data 

Decision Requires a longer time by the unintegrated and less | The availability of updated project data and visualization 

Makings updated data and the absence of visualization, provide more reliable data and fasten the decision-makings 

Historical The engineering data are unintegrated and may be lost | The historical engineering data are stored and integrated in 

Data after the project finished one platform 

Limitation Requires longer time Highly dependent to online internet network 


4. CONCLUSIONS 


The presence of PMIS as a Digital Twin plays a crucial role during the project monitoring in ensuring the timely 
delivery of a construction phase of a project. It can be concluded from the results above; the quantitative data 
shows the digital monitoring is more effective than conventional monitoring during the data collection process. In 
terms of time, digital monitoring improves the productivity by 60x faster than the manual method, with manpower 
efficiency of 33%. Moreover, the implementation of BIM improves the productivity up to 2x faster than the 
conventional methods and cut the time efficiency by 50%. From the qualitative side, PMIS consisted of integrated 
data which more comprehensive between time and visualization, rather than without PMIS that consisted of single- 
sourced data. In terms of the output quality, the integration of BIM and GIS provide more reliable data to support 
the decision makings by the presence of updated project data and visualization, that can be overlayed with the 
digital survey data. Moreover, the project performance feature increase awareness of project current status that 
ease the project management team to identify risks, take actions, and make the corrective action plan. PMIS is 
going to be a major difference in asset management aspect, with the collected database from construction phase 
have been prepared to be handed over to the asset manager as the reference for maintenance activities. Meanwhile 
with the conventional methods, unintegrated engineering data are subjected to be scattered after the project finished. 
Therefore, it can be concluded that the utilization of PMIS accelerates stakeholders’ decision through data driven 
decision making while reduce and mitigate risk of cost overruns as early as possible with the findings on project 
monitoring. Future recommendation of the historical project data which collected in PMIS is expected for future 
project management plan, especially for project with the same characteristic. Besides, the PMIS utilization is 
improving collaboration and understanding of the project situation through Digital Twin with the data intensive 
communication and eliminate the inefficient process with remote monitoring. All in all, the digital twin technology 
nowadays has shown a significant role on providing the efficiency and optimizing the productivity on construction 
industry where each stakeholder should spot the benefit for their organization. Likewise, in Hutama Karya, PMIS 
capabilities currently in use are highly adaptable to future development and updating according to organizational 
needs and technological advancement. Hence, not only the tools are evolving but also the people need to be agile 
and adaptive to maintain its effectiveness to support company's business process excellence. 
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HUMAN-IN-THE-LOOP DIGITAL TWIN FRAMEWORK FOR 
ASSESSING ERGONOMIC IMPLICATIONS OF EXOSKELETONS 


Abiola A. Akanmu, Adedeji O. Afolabi & Akinwale Okunola 
Myers-Lawson School of Construction, Virginia Tech, Blacksburg, VA, USA. 


ABSTRACT: Exoskeletons are increasingly being recognized as ergonomic solutions for work-related 
musculoskeletal disorders in the construction industry. However, users of active back-support exoskeletons are 
susceptible to various physical and psychological risks, which could be exoskeleton type-or task-dependent. A test 
bed is needed to enable deployment and assessment of risks associated with exoskeleton-use for construction 
tasks. This study aims to develop a human-in-the-loop digital twin framework for assessing ergonomic risks 
associated with the use of active back-support exoskeletons for construction work. A literature review was 
conducted to identify risks associated with exoskeletons and the technologies for quantifying the risks. This 
informed the development of a system architecture describing the enabling technologies and their roles in 
assessing risks associated with active back-support exoskeletons. Semi-structured interviews were conducted to 
identify construction tasks that are most suitable for active back-support exoskeletons. Based on the identified 
tasks, a laboratory experiment was conducted to quantify the risks associated with the use of a commercially 
available active back-support exoskeleton for carpentry framing tasks. The efficacy of the digital twin framework 
is demonstrated with an example of the classification of exertion levels due to exoskeleton-use using a 1D- 
convolutional neural network. The study showcases the potential of digital twins for comprehensive ergonomic 
assessment, enabling stakeholders to proactively address ergonomic risks and optimize the use of exoskeletons in 
the construction industry. The framework demonstrates the significance of evidence-based decision-making in 
enhancing workforce health and safety. 


KEYWORDS: Digital twin, ergonomics, exertion, exoskeletons, risk assessment, sensing technologies, work- 
related musculoskeletal disorders. 


1. INTRODUCTION 


The United States Bureau of Labor and Statistics reports that the construction workforce continues to experience 
a higher rate of work-related musculoskeletal disorders compared with workers in other industry sectors 
combined. The back is one of the most affected body parts. Back-related injuries account for about 43% of all 
work-related musculoskeletal disorders (BLS, 2023). Exoskeletons are emerging as potential solutions to 
WMSDs. Specifically, active back-support exoskeletons, a class of exoskeletons, have been shown to reduce the 
risks of overexertion which is one of the triggers of back-related injuries. For example, studies have revealed a 
reduction in muscle activity (Theurel et al., 2018), discomfort in the body parts (Gonsalves et al., 2021; Kim et 
al., 2019), rate of exertion (Alemi et al., 2020; Baltrusch et al., 2021), and range of motion (Cumplido-Trasmonte 
et al., 2023) due to exoskeleton-use. These benefits are motivating construction contractors to explore active back- 
support exoskeletons for construction work. However, studies have also mentioned that exoskeleton-use on 
construction sites could trigger unintended consequences such as loss of balance or fall risks (Alabdulkarim et al., 
2019; Kim et al., 2019; Massardi et al., 2023), physical discomfort and pain (Gonsalves et al., 2023; Gonsalves et 
al., 2021), fatigue (Theurel et al., 2018), and restricting movement when climbing ladders (de Looze et al., 2016; 
Kim et al., 2019). These highlight the likely task-specificity of exoskeletons. In addition, the unintended 
consequences could also be specific to exoskeleton types (Fox et al., 2020; Kim et al., 2019). With the increase in 
commercially available exoskeleton solutions, a testbed would be beneficial to support the testing and assessment 
of the solutions for construction tasks. This has downstream implications for enabling real-time monitoring of 
exoskeleton-use during construction, which could inform strategies for reducing unintended risks. 


Sensing technologies provide opportunities to measure risks associated with exoskeleton-use (Akanmu et al., 
2020; Ogunsejju et al., 2021). Data from sensing technologies could be modeled and analyzed to extract insights 
that can be mapped to workers’ virtual replicas and controls (e.g., rating meters). This digital representation could 
be used by stakeholders (e.g., project and safety managers, and manufacturers) to develop control strategies such 
as deciding on the contextual use of exoskeletons (i.e., most suitable application tasks and duration of use), suitable 
exoskeleton types, and modifications to exoskeleton design. This digital representation is referred to as the human- 
in-the-loop digital twin which is a two-way symbiotic relationship between physical entities (e.g., workers and 
exoskeleton) and their virtual representative in which humans initiate the control (N. Zhang et al., 2022). The 
implication is that the ergonomic consequences of workers' postures while using exoskeletons can be obtained in 
real-time which can enhance their ability to control or self-manage their exposures (Ogunseiju et al., 2021). Thus, 
this paper presents a digital twin framework for assessing the risks associated with exoskeleton-use for 
construction work. A review of the literature was conducted to identify risks associated with exoskeleton-use and 
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the sensing technologies for quantifying the risks. The risks enabled the development of a system architecture to 
support the development of a human-in-the-loop DT framework that can inform decisions to support the 
sustainable use of exoskeletons in construction. Construction workers are interviewed to identify tasks that would 
benefit from active back-support exoskeletons. The efficacy of the DT framework is demonstrated with an 
example of predicting risk of exertion due to the use of an active back-support exoskeleton for one of the tasks. 
This study motivates discourse on human-in-the-loop Digital Twins for construction applications. The study 
highlights the extent to which physiological risks associated with exoskeleton-use can be predicted. 


2. BACKGROUND 
2.1 Exoskeletons for Construction Work: Risk and Assessment Techniques 


Studies have shown that exoskeletons have intended benefits and unintended consequences. For instance, 
exoskeletons are prospective innovative ergonomic interventions aimed at reducing overexertion in various parts 
of the body (Gonsalves et al., 2021). This may in turn reduce the occurrence of WMSDs among construction 
workers (Akanmu et al., 2020). There is evidence that exoskeletons can reduce back muscle activities by 23 - 35% 
(Abdoli-e and Stevenson, 2008). Other benefits revealed in the literature include reduced discomfort to the body 
parts (Alemi et al., 2020), increased productivity, financial gains, and work retention (Kim et al., 2019), and 
increased ability to lift heavier loads and perform repetitive tasks (Mahmud et al., 2022). Despite these benefits, 
there are some unintended consequences associated with exoskeletons. Exoskeletons have been known to trigger 
risks broadly classified in Table 1 as physical and psychological risks. The physical risks include joint 
hyperextension, instability and fall risk, muscle fatigue, bruising, skin and soft tissue injury, and increased 
cardiovascular demand and metabolic cost (Howard et al., 2020; Massardi et al., 2023; Theurel et al., 2018). For 
example, skin irritation or chemical burns could occur if an exoskeleton battery leaks corrosive materials to the 
user (Howard et al., 2020). In addition, due to the added weight of the exoskeleton, the user’s center of gravity 
may be significantly impacted causing balance problems and a diminished recovery rate (Alabdulkarim et al., 
2019). Gonsalves et al. (2021) showed that exoskeletons result in discomfort in the chest and thigh regions. 
Previous research has indicated that exoskeletons can redirect loading from one part of the body to another 
(Picchiotti et al., 2019). For example, during overhead work, exoskeletons reduce the muscle activity in the 
shoulder and the back of the arm but increase muscle activity in the lower back, abdomen, and legs (Theurel et 
al., 2018). Besides, usability, self-efficacy, and safety could be negatively impacted because the exoskeletons 
could get caught around wires and may affect work postures (Baltrusch et al., 2021). Exoskeletons are sometimes 
incompatible with some personal protective equipment such as safety harnesses (Gonsalves et al., 2023). Previous 
studies (Omoniyi et al., 2020; Siedl et al., 2021) have also identified psychological risks such as decreased 
situation awareness/distraction, cognitive overload, fear of the device, and overconfidence in the device. Various 
sensing technologies, such as inertial measurement units and electromyographs, have been employed to quantify 
these risks. Similarly, objective measures from the sensing technologies have been validated with subjective 
assessment instruments such as the NASA Task Load Index and Berg Balance Scale. These are highlighted in 
Table 1. 


Table 1: Risks and assessment techniques of exoskeletons. 


Categories of risks Risks Objective Assessment Subjective Assessment Related Studies 


Joint hyperextension Inertial measurement units; Local Perceived Pressure (Theurel et al., 2018) 


Cameras scale; Borg Rating of 
Perceived Exertion scale 
Instability and Fall Pressure insoles; Force Berg Balance Scale (Alabdulkarim et al., 
risk plates 2019; Kim et al., 2019; 
Massardi et al., 2023) 
Muscle fatigue Electromyography Borg Rate of Perceived Pain (Theurel et al., 2018) 
Physical risks Scale; Borg Rate of Perceived 
Exertion Scale 
Hygiene Biocompatibility tests Usability questionnaires e.g., (Howard et al., 2020; 
issues/Bruising, Skin System Usability Scale Massardi et al., 2023) 
and soft tissue injury 
Cardiovascular Electrocardiogram; Workload assessment (Moyon et al., 2018; 
demand Photoplethysmogram questionnaires e.g., NASA Theurel et al., 2018) 
Task Load Index (TLX) 
Metabolic cost risk Indirect calorimetry Questionnaires for workload (Alemi et al., 2020; 


assessment (e.g, NASA Baltrusch et al., 2021) 
TLX), Rating perceived 
exertion (RPE) with the Borg 
Category Ratio (Borg CR-10) 
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Usability of the Eye tracker; Focus groups; Usability (Gonsalves et al., 2021; 
device (e.g., Electromyography questionnaires; Borg CR 10 Kim et al, 2019; 
perceived discomfort, scale; Body part discomfort Ogunseiju et al., 2022) 
chest pain, catch and scale 
snag risks) 
Decreased situation Eye tracker NASA-TLX (de Looze et al., 2016; 
awareness/ Delgado et al., 2020) 
Distraction 
Exertion Electrodermal activity Rating perceived exertion (Man et al, 2022; 
Psychological sensor (RPE) with the Borg Category Theurel et al., 2018) 
risks Ratio (Borg CR-10) 
Fear; Lack of trust Electromyography; Interview; Self-developed (Omoniyi et al., 2020; 
Photoplethysmogram; questionnaires; subjective | Upasani et al., 2019) 
Electrodermal activity psychological impact test 
sensor 
Cognitive overload Electroencephalography; NASA-TLX; MF (M-VAS)  (Cumplido-Trasmonte et 
Eye tracker; Electrodermal and boredom (B-VAS) al., 2023) 
activity sensor 
Overconfidence Camera Videos; Optical Focus groups; Questionnaires (Baltrusch et al., 2021; 
effect tracking system (OTS) (e.g., Modified Spinal Siedl et al., 2021) 


Function Sort) 


2.2 Digital Twin for Ergonomic Risk Assessment 


In recent years, there have been increasing explorations of DT for diverse applications including workforce health 
and safety. Sharotry et al. (2022) developed a DT to track the biomechanical fatigue in operators caused by 
repetitive action in lifting activities. The study analysed changes in the joint angles in workers’ body joints using 
a dynamic time-warping algorithm. Another study (Greco et al., 2020) presented an ergonomic risk mapping of 
DT workstations using a wearable motion capture system and inputting in virtual simulated environments. With 
the DT, the authors were able to identify risk indexes related to working postures, exerted forces, material manual 
handling and repetitive actions, and sources of biomechanical overload. In construction, Ogunseiju et al. (2021) 
developed a DT framework for improving self-management of ergonomic risks. However, scarce studies have 
explored the assessment of exoskeleton-use using a DT environment. Furthermore, Greco et al. (2020) opined that 
existing DT frameworks have limited roles for users or stakeholders. Humans play vital roles in workplace 
systems; therefore, supporting technologies should be designed to facilitate their input (Sharotry et al., 2020). 


3. METHODOLOGY 


The approach employed in conducting this study is shown in Figure 1. First, a review of the risks associated with 
exoskeletons and the technologies for measuring the impact of the risks was conducted (Section 2.1). This 
informed the development of an architecture of a human-in-the-loop DT system for assessing risks associated with 
active BSEs. Semi-structured interviews were conducted to identify construction tasks that would benefit from 
the use of active BSEs. A laboratory experiment was conducted to quantify the risks associated with using active 
BSEs. This informed the development of an example of a DT-based model for assessing the risk levels of exertion 
during exoskeleton-use. These are described as follows: 


* Noise & Artifact Removal 
* Data Labelling 
* Data Augmentation 


Fig. 1: Overview of research methodology. 
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3.1 System Architecture 


The system architecture shown in Figure 2 is built to illustrate the proposed human-in-the-loop DT framework. 
The system architecture shows the enabling technologies and their role in supporting the assessment of ergonomic 
risks associated with exoskeleton-use. The architecture comprises six layers include physical layer, data layer, 
data transmission layer, storage layer, application layer, and access layer. These are described as follows: 


3.1.1 Physical layer 


The physical layer comprises sensing technologies and physical devices. The sensing technologies support 
capturing of physical and psychological risks, and environmental characteristics of work areas. The physical risks 
include local muscle fatigue, fall risk, joint hyperextension, and metabolic risk which can be measured using 
electromyography (EMG), pressure insole, inertia measurement unit (e.g., comprising of accelerometers, 
gyroscope, and magnetometer), and calorimeter respectively. The psychological risks include cognitive overload, 
lack of trust, and decreased vigilance which are measured using an electroencephalogram, electrocardiogram, 
photoplethysmogram, and eye tracker respectively. The workspace or site conditions can be captured using 
environmental sensors such as temperature and humidity sensors and image-based sensors such as cameras and 
laser scanners. Physical devices include reality technologies such as virtual and augmented reality devices, and 
other data acquisition technologies for collecting subjective data to evaluate the aforementioned objective 
measures. Reality technologies support the development of risk-free simulated construction site environments 
where workers can practice work with different exoskeletons. 


3.1.2 Data layer 


Data from the physical layer are captured in the data layer. The data layer contains the data generated from the 
sensors and physical devices, such as raw acceleration and angular velocity from the IMU, brain waves from EEG, 
electrical conductance of the skin from EDA sensors, eye fixations from eye trackers, muscle activity from EMG, 
and temperature and humidity from temperature and humidity sensors. Subjective data (such as perceived 
cognitive load, rate of exertion, and discomfort levels) are also stored in the data layer. This layer also contains 
videos of construction work and general characteristics of the work area that might explain or influence risk factors 
of WMSDs. 


3.1.3 Data transmission layer 


The data transmission layer transfers data from the data layer to other layers for storing, modeling and analysis, 
and DT representation. Different communication technologies could be used in this layer, such as short-range 
transmission technologies e.g., Wi-Fi, Bluetooth, Zigbee, near-field communication (NFC), and Zwave, and long- 
range transmission technologies e.g., 3G, 4G long-term evolution (LTE), and low-power wide-area networks. 


3.1.4 Storage layer 


This layer consists of cloud services that store data received from the data transmission layer and application layer. 
Heterogenous data from these layers are gathered and stored in a cloud storage system for exchange or sharing 
with other layers. The data or information can be beneficial for extracting other insights that can help improve the 
health and safety of workers. Depending on the stakeholders and their information needs, multiple repositories 
may be included. As such, different access rights may be provided. For instance, a data analyst may need access 
to label subjective data obtained from the data layer to enable assessments involving risk classifications. A 
safety/health manager may be provided access to data that can inform impact on workers’ health while impacts 
while a project manager may be provided access to data relating to impact on productivity. 


3.1.5 Application layer 


This layer includes algorithms and applications for processing and analyzing data obtained from the storage layer. 
The data are processed and represented in formats that can be used by decision-makers in the access layer (Section 
3.1.6) for decision-making. For example, to assess workers’ levels of exertion from their electrodermal activity 
(EDA) signals, this layer will use signal processing algorithms, feature extraction, and deep learning networks 
(e.g., conditional generative adversarial network, recurrent neural network, and long short-term memory), and 
visualization algorithms. Signal processing algorithms such as discrete wavelet transforms and adaptive predictor 
filtering methods will be used to reduce artifacts from the EDA signals. A symmetric multilayer perception model 
for extracting features will be used to extract informative features from the EDA signals. The extracted features 
will be fed into deep learning networks (e.g., conditional generative adversarial network, recurrent neural network, 
and long short-term memory) to classify the EDA signals into the levels of exertion. Finally, visualization 
algorithms will be used to augment the levels of exertion on a virtual replica of the worker and a rating meter. 


1236 


SECTION F - DIGITAL TWIN 


3.1.6 Access layer 


In the access layer, stakeholders can visualize the impact or extent of the risks as a virtual replica of the worker 
and a rating meter. This layer includes the following: (1) beneficiaries of the DT platform and (2) how they access 
the DT platform. The beneficiaries may include safety managers, project supervisors, and product manufacturers. 
Safety managers may want to understand if the workers are reaping the intended health benefits of the technology. 
Project supervisors may want to know the impact of the technology on project performance. Both stakeholders 
could use the feedback to work with manufacturers to plan more suitable designs for their projects. The 
stakeholders can monitor the performance of the workers via interactive dashboards and web applications. The 
performance of the workers will be shown in the form of their virtual replica and a rating meter to interpret the 
risks. For example, the levels of exertion associated with an exoskeleton (e.g., no exertion, low exertion, medium 
exertion, and high risk) that is computed in the application layer, will be shown as different colors in a virtual 
replica and rating meter (e.g., green, yellow, and red human). In this way, the project stakeholders can understand 
the type and extent of the risk, which could inform decision making such as which type of exoskeleton to use for 
what task, how long the exoskeleton should be used for the task, and changes that should be made to the device 
to better adapt it to construction work. 


Access 
Layer 


2S = OD 
Stakeholders Exoskeleton users 


Application 
Layer Signal processing 


algorithm 


Storage 
Layer 


Historical database Local database Database server 


Data 
‘ransmission 
Layer 


Data 
Layer 


Physical 
Layer 


Camera 


Virtual reality Exoskeletons 


Fig. 2: System architecture. 


3.2 Semi-Structured Interview 


Semi-structured interviews were conducted with industry practitioners (n=8) to understand the construction tasks 
that would benefit from the use of active back-support exoskeletons. A purposive sample was used to identify and 
select potential participants who could provide valuable insights to the study. The research team selected 
participants with experience in construction safety, and technology implementation in the construction industry. 
The interviews were conducted over Zoom and recorded. The transcripts of the interview were coded and 
emerging themes were identified. An inter-coder reliability test was conducted on the coded data using the Cohen- 
Kappa coefficient. Cohen-kappa coefficient of 0.90, indicating a strong level of agreement, resulted from the 
assessment. 
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3.3 Experimental Procedure, Participants, and Data Collection 


Sixteen students were recruited to participate in a carpentry framing activity; one of the activities identified from 
Section 3.2 as beneficiaries of active back-support exoskeletons. The participants reported no prior issues related 
to musculoskeletal disorders that could affect their performance in the study. The experiment was approved by the 
Virginia Tech Institutional Regulation Board (IRB: 19-796). The task involved the following: (1) measuring 
timber planks (i.e., four 1”x4”x47” planks and two 1”x4”x70” planks) needed to construct a 47”x70” frame; (2) 
assembling the measured timber materials as shown in Figure 3; (3) nailing the assembled timber frame using a 
nail gun; (4) lifting and moving the erected frame, which weighs approximately about 40lbs, to an upper floor via 
staircase for installation on the upper floor; and (5) installing the frame by aligning the frame with an existing 
wall. The participants completed these tasks while wearing CrayX, an active back-support exoskeleton from 
German Bionics. Their electrodermal activity was measured using Emotibit, an open-source biosensor. The data 
was collected at 50Hz i.e., 50 data points per second. After performing the framing task, each participant was 
presented with Borg’s rating of exertion scale (Borg CR-20) and asked to provide subjective ratings of their 
perceived exertion for the entire task. The Borg scale ranges from 6 (not exhausting) to 20 (extremely exhausting) 
(Albert et al., 2021; Borg, 1982). The participants’ ratings of perceived exertion were measured using the Borg’s 
exertion rating scale (Borg CR-20) which ranges from 6 to 20, where 6 represents no exertion and 20 represents 
maximum exertion. Figure 3 shows participants performing framing tasks with the exoskeleton and the biosensor. 
The task was video recorded. 


EEG cap 


CrayX Exoskeleton 


Fig. 3: Participant performing framing task while wearing an exoskeleton and EEG cap. 


3.4 Data Preprocessing 
3.4.1 Noise and artifact removal 


The collected EDA signals were filtered to remove the noise and artifacts. A low pass filter with a cut-off frequency 
of 4 Hz was employed to remove the noise. A Gaussian filter was used to smoothen and attenuate the artifacts. 
MATLAB was used for this purpose. 


3.4.2 Data labeling 


Using the time-stamped video, EDA data corresponding to each participant’s tasks were sorted and structured. 
The ratings of 7— 11, 12 — 14, and 15 — 20 were represented as low exertion, medium exertion, and high exertion, 
respectively (Chowdhury et al., 2019). The sorted EDA data of each participant was labeled based on their 
intensity class as shown in Table 2. 


Table 2: Classes, labels, and data points. 


Classes Labels Percentages of participants (%) Number of data points 
Low Exertion LE 70 50759 
Medium Exertion ME 12 6823 
High Exertion HE 18 14072 


3.4.3 Data augmentation 


The EDA data of the minority classes were augmented due to the imbalanced nature of the datasets. Studies have 
shown that balanced datasets result in the Synthetic Minority Oversampling Technique (SMOTE) being employed 
to balance the EEG data of the minority classes with the majority classes (Sowjanya & Mrudula, 2023). For 
example, from Table 2, the ratio of the datasets in the LE, ME, and HE classes (i.e., Low Exertion, Medium 
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Exertion and High Exertion respectively) is 7:1:4. As shown in Table 3, more datasets were generated with 
SMOTE to balance the datasets of the minority classes (i.e., Medium Exertion and High Exertion) with the class 
with the most datasets (i.e., Low Exertion). 


Table 3: Classes, labels, and data points (raw and balanced). 


Classes Labels Number of raw data points Number of balanced data points 
Low Exertion LE 50759 50759 
Medium Exertion ME 6823 50759 
High Exertion HE 14072 50759 


3.5 Risk Classification 


This study employs a 1-D convolutional neural network to classify the EDA data into the above-mentioned classes 
(1.e., LE, ME, and HE). 1-D CNN is suitable for 1D signals whose applications have high signal variations (Y. 
Zhang et al., 2022). The network comprises an input layer, a convolution 1-D layer, a batch normalization layer, 
a Rectified Linear Unit (ReLUV) layer, a dropout layer, a maxpooling layer, a fully connected layer, a softmax layer, 
and a classification layer. The convolution layer applies filters to the input data obtained from the input layer and 
extracts distinctive features using 10 filters of width 10. The batch-normalization layer improves the stability and 
speed of training the network by normalizing the input to each layer. The ReLU layer applies a non-linear 
activation function to the output of the batch-normalization layer. The dropout layer helps to prevent overfitting 
of the model. The maxpooling layer down-samples the output of the dropout layer. In the fully connected layer, a 
linear transformation is applied to the input vector through a weight matrix so that every input influences every 
output of the output vector. The softmax layer takes in the output from the previous layer and presents a vector 
that illustrates the probability of the class that the input belongs to. The classification layer presents the results of 
the softmax layers as classes of the assessed risks. The network was trained using the Adam optimizer (Karim et 
al., 2019). Due to the size of the dataset, 300 epochs were used. The learning rate was set to 0.01. 


The balanced data was split as follows: 70% of the data was set aside for training, 15% of the data was also set 
aside for validation and 15% was intended for testing the trained model. MATLAB R2023a, installed on a machine 
with NVIDIA GeForce RTX 2080 GPU and 16GB memory, was employed for the classification. Commonly used 
metrics for assessing the performance of machine learning models were employed in this study. These include 
accuracy, precision, recall, and F1-score (Bangaru et al., 2021). 


4. RESULTS AND DISCUSSION 
4.1 Construction Tasks for Active Back-Support Exoskeleton-Use 


The results of the semi-structured interview were represented in the form of a word cloud. Word clouds are 
graphical representations of the frequency of concepts or keywords that are significant in discourses (Adu, 2019). 
The word cloud in Figure 4 provides a quantitative and visualized method to illustrate the key construction tasks 
suggested by the participants to benefit most from active back-support exoskeletons. The most mentioned tasks 
include plumbing, carpentry, steel, drywall and rebar installation, and labor work. The least mentioned tasks 
include ceiling, electrical, scaffolding, and flooring work, mason, and ceiling work. The suggested tasks support 
reporting of industry databases (BLS, 2023) and research studies (Antwi-A fari et al., 2023; Gonsalves et al., 2023) 
that identified back-related injury as a concern in the construction industry. Some of the tasks suggested by the 
practitioners (e.g., carpentry work, rebar installation, concrete work, and masonry) were also identified by (Kim 
et al., 2019) as being suitable for exoskeletons. Similarly, Gonsalves et al. (2023) identified framing and plumbing 
as being more suitable for back-support exoskeletons. 
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Drywall 


Masonr 
plumbing?" 
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Flooring Rebar 
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Fig. 4: Word cloud of construction tasks that would benefit from active back-support exoskeletons. 


4.2 Example of Prediction of Level of Exertion from Exoskeleton-Use 


This study presents an example of predicting the levels of exertion due to exoskeleton-use. This section presents 
the performance of the 1-D convolutional neural network in classifying the levels of exertion during exoskeleton- 
use for a framing task. 


4.2.1 Model performance evaluation 


The accuracy of the 1D-CNN in classifying the levels of exertion due to exoskeleton-use for the framing task is 
82%. The confusion matrix for the model is illustrated in Figure 5. The matrix shows that the model performed 
better in detecting the ME and HE classes than the LE class. For example, the model detected classes ME and HE 
with 100% accuracy and LE class with 67%. Furthermore, 37% of the LE class is mostly confused with the HE 
class. 


LE ME HE 


33% 
LE 


HE 


True Class 


Predicted Class 
Fig. 5: Confusion matrix showing classification accuracies of levels of exertion due to exoskeleton-use. 


From the precision (see Table 4), it can be observed that out of the times ME and HE classes were predicted, the 
model was correct 100% of the time. However, out of the times HE class was predicted, the model was correct 
67% of the time. For the recall, out of all the times the HE class was predicted, only 75% of the class was correctly 
predicted. 


Fl-score explains a model’s ability to both capture classes (recall) and accurately capture the classes (precision). 
The F1-score of the ME class is 100%, meaning that the model has a balanced ability to accurately capture all the 
ME classes from the data. However, in the case of the Fl-scores of the LE and HE classes which are 80% and 
75% respectively, the model will have a mixed reaction. For the LE class, while the model may be correct 67% of 
the time, all the LE class predicted will be correct. The findings of this study have shown the effectiveness of 1D- 
CNN in classifying EEG signals (Alzahab et al., 2021). The lowest Fl-score obtained in this study is still high 
and can be compared to other construction-related studies (Xiong et al., 2022). 


Table 4: Performance metrics for ID-CNN for the classes. 


Classes Precision Recall Fl-score 
LE 67% 100% 80% 
ME 100% 100% 100% 
HE 100% 60% 75% 
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4.2.2 Level of Exertion and Digital twin 


The digital twin of the exoskeleton-users and the rating meter (shown in Figure 6) shows the level of exertion 
resulting from the results of the model. The digital twin shows an exoskeleton-user experiencing medium exertion. 
The meter comprises a pointer and three different colors, red, yellow, and green indicating high, medium, and low 
exertion respectively. The pointer reflects the level of exertion which is currently shown as medium exertion. 
Related studies have shown the possibilities of creating similar interfaces for data acquisition as a human-in-the- 
loop digital twin (Locklin et al., 2021). This exoskeleton risk assessment dashboard can also be extended to show 
the muscle activity, cognitive load, and fall rating of the exoskeleton user. This vital data collected via sensors 
can be useful for supervisors and managers to monitor workers while using the exoskeletons. 


Exoskeleton Risk Assessment Dashboard 


BB Low Exertion I Medium Exertion [J High Exertion 
Low Exertion 


Medium Exertion ý 
duh 


High Exertion 


Fig. 6: Dashboard showing digital twin representation of the level of exertion. 


5. CONCLUSIONS, LIMITATIONS AND FUTURE WORK 


This study aims to investigate a digital twin framework for assessing the risks associated with exoskeleton-use for 
construction work. A review of the literature was conducted to identify risks associated with exoskeleton-use, and 
objective and subjective methods for assessing the risks. A system architecture was developed to illustrate the 
enabling technologies and their roles in supporting the proposed framework. Results of interviews with 
construction workers identified carpentry framing task as one of the construction tasks that can benefit from active 
back-support exoskeletons. Electrodermal signals were collected during the experimental simulation of the 
framing task with an active back-support exoskeleton. 1D-CNN trained to classify electrodermal data 
demonstrates the potential of the DT framework to predict the exertion levels of exoskeleton users during framing 
tasks. This study contributes to the scarce literature regarding the use of digital twins for assessing the suitability 
of exoskeletons for construction work. The study demonstrates the role of physiological sensing and machine 
learning techniques in facilitating the implementation of the digital twin framework. Furthermore, this study sets 
precedence for research involving the use of digital twins for performance monitoring of exoskeletons during 
construction work. Such efforts could promote the sustainability of exoskeleton solutions in the construction 
workplace. This study had some limitations which are currently being addressed in an ongoing study. Firstly, 
EDA data was generated from students engaged in a laboratory-based simulation of framing tasks. The use of 
experienced construction workers could produce data that can help develop prediction models that are 
generalizable for the construction population. Secondly, only the exertion levels using the EDA data was modeled 
to demonstrate the digital twin framework. Further work will involve other sensing technologies and the prediction 
of other physical and psychological risks. Future studies will also involve a user assessment of the digital twin 
framework with intended users. 
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ABSTRACT: The management of the built environment is a topic that requires reference to the management of 
complex systems. In fact, the variety of domains involved means that the management of urban centers is not only 
complicated, and therefore it is not enough to model a set of rules that are representative of phenomena related 
to the real environment. Not only that, but what is evident is that emergency management lacks the ability to access 
real-time information that could be decisive. Having tools that provide real-time data, that reprocess it, and that 
are able to provide an enriched and slightly predictive view of what is happening offers the possibility of having 
a real impact in the management of the built environment. In this sense, digital twins are a valuable approach to 
achieving the desired results. Digital twins through the integration of technologies such as Internet of Things (IoT), 
simulators, Artificial Intelligence (AI), and Augmented Reality (AR) technologies make it possible to develop 
systems capable of exploiting the concept of collective intelligence, in a digital version, through a large number 
of heterogeneous agents working according to stigmergic mechanisms. This research work aims to propose its 
own architecture of digital twins for the management of resilient urban centers, with particular reference to the 
management of post-earthquake reconstruction scenarios. 


KEYWORDS: Digital twin, urban environment management, urban centers, smart cities, emergency management, 
BIM. 


1. INTRODUCTION 


The 2030 Agenda for Sustainable Development (Agenda, 2030) has set 17 Goals among which Goal 11 is defined 
as follows "Make cities and human settlements inclusive, safe, resilient and sustainable." In addition, there is a 
recent trend that identifies an increasing population shift to urban centers. According to the United Nations, over 
55 percent of the world population inhabited cities in 2008, with the percentage expected to rise to 68 percent by 
2050 (UN 2019). All of this implies the need to rethink cities focusing not only on the sustainability of these 
centers of aggregation but also with the intention of making them resilient to change and responsive to unexpected 
events. 


It should be noted that urban centers, cities are complex organisms (De Toni et al., 2013). This implies that it is 
impossible to manage the processes affecting them through rules, but a holistic and integrated approach is needed. 
One approach that has been pursued for years to the problem of managing urban centers is the development of 
smart cities. The term smart city is said to have first appeared in the middle of the 1990s, when the cities promoted 
themselves after introducing new information and communication technology (ICT) infrastructure or e-governance 
services, or when attracting technology companies to provide new economic growth to the region (Hollands, 2008). 
Anyway, a smart city should go beyond the mere use of ICT systems. It is expected that a smart city should improve 
the quality of life of its citizens, while simultaneously simplifying the management of the city. In some cases of 
developing smart cities the term “smart” was referred to an automated mechanism introduced to perform the 
desired activity within a given domain (Ahad et al., 2020). Other smart city paradigms relate technology more 
directly with innovation and human capital development, based on the concept that technology can give a city’s 
constituents the power to innovate, create, participate in society and solve problems collectively for the common 
good (Angelidou, 2015). In any case the current research on smart city does not fully address the complex nature, 
conflicts and interdependencies of the smart city objectives (Shamsuzzoha et al., 2021). 


Meanwhile in recent years we assist the rise of Building Information Modelling (BIM) approach in a wider way 
than the one that wants it to represent but a small part (a narrow building-level view) within the wider 
environmental context. BIM's massive introduction into the processes of the built environment has led to a higher 
level of digitization of all kinds of information, as this approach allows the achievement of high performance only 
when embedded in digital systems and platforms. This evolution of BIM should be carefully framed within a 
paradigm that factors in people, processes and new emerging technologies in an increasingly interconnected world 
(Boje et al., 2020). 


In this scenario it is expected that residents of future smart cities will be exposed to unprecedented amounts of 
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real-time information on a daily basis (Du et al., 2020). This data also coming directly from people and equipment 
and assets can contribute to the development of a stigmergic system. Stigmergy is a communication method used 
in decentralized systems by which individuals in the system communicate with each other by modifying their 
surroundings and leaving traces (Debreu Netto et al., 2015). Indeed, the scenario of the 21st century is increasingly 
characterized by interactions between the physical and virtual worlds, thanks to the progressive creation of a global 
connective space with a high intensity of information flows, the potential of Information Communication 
Technology (ICT), as well as the Internet of Things (IoT), of Big Data, Virtual and Augmented Reality, and the 
spread of progressively more powerful computational devices, whose high processing capabilities, albeit without 
discretion, are being defined, namely the so-called Artificial Intelligence and Machine Learning, and related 
predictive algorithms (Cinquepalmi and Pennacchia, 2020). All this leads to a currently emerging paradigm and 
that is the Digital Twin (DT) paradigm. Other industrial sectors than construction have been developing digital 
twin-based concepts over the past decade, but this approach has made its first appearances in the AECO sector 
only over the past few years. Digital twins are a digital replica of their real counterparts. A DT is always a 
representation and for this reason unlike its physical counterparts, it is not an all-or-nothing proposition. DTs can 
be tailored so as to choose to collect information only about features that have value for the stakeholders involved 
or for the aim it is developed for. There is a crucial aspect in DT that differs them from simulation models and 
even common smart cities paradigm (Fig. 1) and it is prediction. Numerous examples have shown how digital 
twins can continuously monitor operations and identify abnormal behaviors, allowing human operators to react 
promptly and reduce downtime (Arup, 2019). In any case in the longer-term, it is evident that no single DT will 
be sufficient for modern complex cities: in a smart city scenario, independent DTs of various assets will need to 
communicate and cooperate, providing feedback to a central decision making “hub” or city-level decision makers 
(Pregnolato et al., 2022). 


This paper presents a framework for implementing digital twins in urban centers. Building on what DA (Silva et 
al., 2018) showed as a technology for the smart city, our proposal is to include a reasoning layer through real-time 
data, which is one of the aspects that differentiates DT from the rest: short- to medium-term forecasting to provide 
decision support to the decision makers involved. The scenario that is particularly referred to is that of emergency 
management. Since city management requires dealing with complexity the digital twin is the ideal method to take 
into account different data and analysis in an integrated way. Starting with an analysis of the scientific bibliography 
concerning the digital twin and smart cities illustrated in the next section, the article aims to focus on real-time 
management and the insights that DTs can offer to support the management of the unexpected. Case study on 
which to implement the proposed framework will be presented. 


2. LITERATURE REVIEW 


Performing a search on the Web of Science, using the keywords "digital AND twin* AND "urban" AND "cit*" in 
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Fig. 1: a) Smart city architecture proposed by Silva et al., 2018; b) DT architecture whose fundamental layer 
that differentiates this from Smart Cities is the Reasoning and simulation layer that allow for short/medium 
term previsions supporting prompt decision-making particularly significant during emergencies. 


a time range from 2006 to the present, yielded a total of 379 articles, most of them in the area of Computer Science 
(21.1%) and Engineering (17.8%)(Fig. 2). Particularly in relation to the latter area, there were only 153 papers in 
total (Fig.2). Furthermore, the metric analysis showed that most of the publications are concentrated in the last 
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Fig. 2: The graph above shows the number of papers containing the terms “digital”, “twin” and “urban” in 
the period from 2006 to 2023 divided by subject area. The graph below instead shows the trend (referred to 
the same period of time) of papers with the same key words comprised in the area of Engineering. The spike 
in the adoption of certain terms from the years 2021-2022 is evident. 


five years, i.e. from 2018 to the present (Fig. 2). This data highlights how the topic is of recent interest in the 
international scientific community. From the 153 articles identified, a screening was first carried out by reading 
the titles of the articles and discarding those of no interest. Subsequently, the abstracts were read and the search 
areas were identified. These ranged from the topic of infrastructure and urban mobility to that of the construction 
and built environment. For each area, an article of particular interest to the study was identified, argued below and 
summarized in Table 1. 


Gang Yu et al. (2021) focus on urban infrastructure, proposing a digital twin framework for the operation and 
maintenance of tunnels. Haishan Xia et al. (2022) analyze the BIM GIS relationship within the digital twin 
framework used to support smart city development. The study is also conducted in the area of infrastructure, in 
relation to rail transport. Salem and Dragomir (2022), on the other hand, develop a literature analysis of the 
construction project management using digital twin approach. Lv et al. (2022), Corrado et al. (2022) and Nica et 
al. (2023) carry out their research in the field of urban planning, also from a sustainable perspective (Corrado et 
al., 2022; Nica et al. 2023). In particular, Lv et al. reflect on the importance of using DTs urban platform to improve 
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the future planning vision. Yu et al. (2023) analyze the support that new digital technologies can offer in the 
creation of smart cities, as does Wenhua Huang et al. (2022) in relation to the Internet. Other areas of interest are 
those concerning the use of the Digital Twin for the management of the built environment (Rotilio et al., 2023) 
and urban mobility (Yeon et al., 2023). 


The research performed showed that the topic of DT in support of smart cities is an area of study of recent interest 
within the international scientific community and that it is addressed in specific areas, lacking a global and 
multidisciplinary approach. This research gap highlights how further research is needed, which this article aims to 
help bridge. Indeed, the main objective of the research presented here is to provide a framework that, based on a 
DT approach, supports the management of the complex contexts typical of post-disaster reconstruction in which 
many issues and disciplines converge. A demonstrator will be presented for framework validation. 


Table 1: Summary of the main references analyzed 


Reference, Aim of the research Results Search area Case study 
year 
Gang et To propose a digital twin-based Results show that the framework can Urban Fault cause 
al., 2021 decision analysis framework for the provide efficient and automatic decision infrastructure analysis of fans 
O&M of tunnels analysis support for the O&M of tunnels maintenance, in Wenyi Road 
operation and Tunnel in 
evaluation Hangzhou, 
China 
Haishan et To study the combination BIM and A professional disconnect and fragmented Rail trasportation - 
al., 2022 GIS for the urban digital twin to composition pose challenges in the field of 
support sustainable smart city GIS and BIM integration. Future research 
design should focus on smart city planning, 
updating, management; ontology-based 
GIS and BIM data integration platform; 
and operation; and the collaborative 
management of urban rail transportation 
engineering 
Salem and To review the literature on Authors propose a framework for Construction area - 
Dragomir, construction project management analyzing and supervising the 
2022 through the lens of digital twins development of digital twins that uses 
three main stages: the BIM; the existing 
monitoring and actuation digital twins; the 
artificial intelligence 
LV et al, To promote the expansion and The construction of DTs urban platform Urban planning - 
2022 adoption of Digital Twins (DTs) in can improve the city’s perception and 
Smart Cities (SCs) decision-making ability and bring a 
broader vision for future planning and 
progression. 
Corrado et To propose a comprehensive A  metric-driven framework for Sustainable urban Buildings in the 
al., 2022 approach that takes the multiple sustainability planning that understands a planning Technical 
facets of sustainable urban planning city as a sociotechnical complex system University of 
into consideration was proposed Crete campus 
Nica et al, To inspect the recently published  Internet-of-Things-based smart city Sustainable urban - 
2023 literature on digital twin simulation environments integrate 3D virtual governance 
tools, spatial cognition algorithms, simulation technology, intelligent sensing networks 
and multi-sensor fusion technology devices, and digital twin modeling 
Yu et al, To study whether and how current The research provide suggestions for the Smart cities China 
2023 practices based on digital twin development of digital twin technology- 
technology can help the based ecosystems in emerging economies 
development of smart cities in 
China 
Wenhua et To propose a basic concept of The key problems solved by digital twin Urban - 
al., 2022 digital twin, and gives the technology are detailed and CloudIEPS, an infrastructure 
construction method and possible energy internet planning platform based on 
applications of the energy internet digital twin, is introduced 
digital twin 
Rotilio et To realize a first extended Digital Twin is exploited as an adaptive Built environment DT 
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al., 2023 framework enabling the system for the built environment, as a demonstrators 
implementation of digital twins in support to optimize _ post-disaster in the 
the built environment reconstruction processes with a focus on laboratory 
reactive security management 


Yeon et To introduce DTUMOS, a digital DTUMOS is a versatile, open-source Urban mobility Large 
al., 2023 twin framework for urban mobility framework that can be flexibly and metropolitan 
operating systems adaptably integrated into various urban cities including 
mobility systems. His novel architecture Seoul, New 
combines an Al-based estimated time of York City, and 
arrival model and vehicle routing Chicago 
algorithm 


3. RESEARCH METHODOLOGY 


The proposed application framework for the development of DTs to support the management of urban centers 
begins with the identification of needs. It should be kept in mind that a DT however comprehensive is always a 
representation of certain components of reality. For this reason starting from the identification of needs or we 
could say critical issues for which decision support is desired can begin to design the digitization of the real world. 
This analysis can only start from the dialogue with the stakeholders concerned, the administrators of the territory 
(Public Administrations) and the decision makers who will then be the end users of the product. 


The critical issues and needs thus identified lead to a second analysis concerning which aspects, parameters or 
agents need to be monitored in order to collect data useful for managing the identified issue. The connection in 
fact between the real world and the digital world is given by the Data Integration Layer (that will be better 
explained in the next section) which collects data through the use of the most suitable technologies. The choice of 
the latter also comes from a thorough analysis not only of what data needs to be collected but also of the context, 
the instrumentation possibilities of the environment, the data transmission technologies themselves, etc. Once the 
data have been collected and integrated with each other, we move on to the use phase. 


As previously mentioned what makes a digital twin different from simulation systems is the ability to make short- 
and medium-term predictions by exploiting collective intelligence. To do this, it becomes necessary to identify the 
best tools for this intelligence to emerge and be recognizable and interpretable. Artificial intelligence tools, 
Bayesian networks, and game engine-type agent simulators are some of the possible means of interpreting data by 
highlighting patterns or recognizing behaviors, applying multifactor interaction knowledge on problems, or 
performing high repetitions of random simulations to probe all possible scenarios. Depending on the data collection 
method, the subsequent possible processing also varies and thus the choice of the two components of the DT is 
strongly influenced. 


Finally, the last aspect that profoundly affects the system is how the information resulting from the processing can 
be transmitted to the relevant stakeholders. The first choice to be made is whether it will be information used in 
the back office or directly on site. In both cases, the solution involving the implementation of a cloud platform is 
the best since it allows more immediate access to the data even from smartphones. In the case of information 
displayed directly on site in addition to the previously mentioned smartphones, solutions involving augmented, 
diminished or mixed reality can also be used. Although tools for displaying information as holograms are not yet 
widespread on a large scale, the power of superimposed visualization is for many applications of extreme 
importance. However, it should be kept in mind that recently registration techniques are being developed such that 
overlay visualization can be achieved even with the much more common smartphones. 


In the following paragraph the application of what was introduced in the methodology will be translated into the 
architecture of the proposed system. 


4. A FRAMEWORK FOR THE IMPLEMENTATION OF DIGITAL TWIN IN 
URBAN ENVIRONMENT 


Weare sometimes faced with some confusion in differentiating simulation systems from actual digital twins. There 
are two main characteristics that put together determine and differentiate a DT from the rest: real time and short- 
and medium-term prediction. The forecasting capability that is the one that most characterizes a DT is achieved 
by an intelligence component that can predict unexpected situations and propose optimized solutions based on data 
from the physical layer in real time. The proposed integration methodology for an optimal DT implementation 
involves the development of a framework architecture that consists of four layers added to the physical one: the 
data acquisition layer, the data integration layer, the digital twin modeling layer and finally the user interface layer 
renamed the presentation and service layer (Fig. 3). 
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The physical layer refers, in this case, to the urban environment that is the object of the digital twin. Buildings, 
roads and facilities of cities are all considered since this will enable a valuable support to decision. The different 
level of scale of assets covered and the multi-domain visualization of information integrated with each other will 
provide a look at all crucial aspects at once. 


Data acquisition layer (Data collection layer in Fig.1-b) is a functional representation of the interface between the 
physical layer and the framework and is aimed to fetch data to digital models. In this layer are managed gathering 
technologies and transportation data protocol parsing. Data can be divided into two distinct categories: static and 
dynamic data. Static data are related to the configuration of context, thus not changing continuously. It includes 
preliminary survey data and characterization of physical asset (e.g. BIM and GIS models, Point Cloud models, 
etc.). The purpose of collecting these data on an urban scale is to report a representation of the built environment 
as built and in its current state of preservation. Given the heterogeneity of the type of constructions objects of the 
representation (buildings, roads, bridges, infrastructure) data will also have heterogeneous formats and sizes, as 
well as different scales of representation of the data. Dynamic data on the other hand are related to observations 
about relevant aspects for the behavior simulation of the physical system. Dynamic data are typically transported 
in XML or JSON format. Cameras, wireless sensors, citizens' smartphones can all contribute to the acquisition of 
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dynamic data, and by wisely choosing their placement these help create a stigmergic environment in which it is 
the assets, equipment, and people themselves (whether workers or citizens) that supply the DT with data. 


Representation could be solved with the adoption of the Semantic Sensor Network Ontology (SOSA). SOSA 
ontology is recommended by the World Wide Web Consortium (W3C) for describing sensors and their 
observations, the involved procedures, the studied features of interest, the samples used to do so, and the observed 
properties, as well as actuators. Procedure for data acquisition optimization are fundamentals so as to avoid huge 
amounts of useless gathered data to analyze and improve the efficiency of the system. 


Data Integration Layer will be in charge of integration of data at different scales (such as BIM and GIS 
representations), of different formats, and multidisciplinary. Beside this first task of data integration layer it has to 
be taken into consideration that most of popular data models (e.g. IFC, CityGML) do not claim to be based upon 
a formal Top Level Ontology since they have been developed pragmatically over years, and have been validated 
by practice. After the definition of the industrial exchange format IFC (Industry Foundation Classes), an important 
step in the direction of extending the scope of building data models is the recent introduction of IffOWL ontology 
standard which is the Web Ontology Language format of IFC. This advance goes towards the embracement of 
Web of Data technologies to model the context of interest (buildings and assets). On the same line, research 
methodological approach is inspired by the expected benefits obtainable with Web of Data technologies such as 
the ability to granularly link data from different models. Web of Data consists of two technologies that are built 
upon the basic Web then relying on Universal Resource Identifiers (URIs) and Hypertext Transfer Protocol 
(HTTP): Semantic Web: RDF (Resource Description Framework) for graph-based data representation, and OWL 
(Web Ontology Language) for specification for shared conceptual models; Linked Data: Principles for specifying 
the interrelations and access across different datasets. 


Both technologies contribute to the specification of the shared meaning of entities, one of the crucial problems in 
interoperability. This approach allows to support the integration of BIM models represented in IffOWL with other 
building data referred to various ontologies directly in RDF; so many XML-based formats will also be supported 
such as CityGML-based format (for city scale representation) or sensor and observations data ontologies. 


Then there remains the issue of data that are unstructured or do not have their own ontologies. This aspect for what 
concerns DT at the urban scale is extremely relevant since the vastness and heterogeneity of the elements taken 
into account makes the possibility of interfacing with multidisciplinary and multiscale data very realistic. A 
powerful framework therefore having to take this into account should contain the possibility of including new 
ontologies for both data and semantic links between structured and unstructured data. The framework proposed 
envisages the possibility of sharing data stored in RDF repository on the Web so that users can access them in a 
granular manner. Besides graph-based data model, schema/ontology languages, serialization formats, and 
query/reasoning languages, should provide a RESTful API technology for clients to interact with data. 


Data Integration Layer can be mapped to Data Transmission Layer in Fig. 1-b because they methods shown also 
provide the means by which to communicate some of the data itself. 


DT modeling layer (Data management layer plus Reasoning and simulation layer Fig. 1-b) is a virtual layer ideally 
containing the Digital Twin model or models. These differ according to the purposes for which they are 
implemented (e.g. in Fig.3 one DT for supply chain management, one for environmental pollution management 
and the last for H&S management). This layer contains the digital copy of the asset and the simulation tools for 
behavior forecasting. In the framework here proposed DTs are thought to be distributed over the cloud and 
connected to the framework via the internet. Different DTs are implemented by using domain specific algorithms 
and tools and are fed by Data Integration Layer through the REST Interface. Any DT internal structure is highly 
dependent on its specific domain and scope. Nevertheless, it must have a normalized external interface to link 
itself into the framework. Such interface is minimal and includes following data flows (usually in XML/JSON 
format): first configuration parameters and variables, time synchronizing clock, input and output streaming and 
exposed variables shared with the framework. 


Finally, the presentation and service layer (Application layer Fig. 1-b) is a virtual layer on top of the framework 
that enables interaction with people/stakeholders. It corresponds to the RESTful API interface for retrieving 
information from the model integration bus. This layer is designed as a layer for distributing information to external 
management systems and specific user interfaces. This layer allows interaction with the population, thus also 
defining the DT as a tool for improving communication with and active participation of the citizenry. On the other 
hand, this layer is also the one that allows customers or decision makers to visualize data and possible suggested 
corrective actions. Again, the services toward which one can strive are diverse: three possible outputs of DTs are 
shown in Fig.3, namely support for site safety management, support for decision makers involved in real estate 
asset management, and communication with the public. It is at this level that the services that the digital twin will 
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take care of are defined in detail, and so in a reverse engineering effort we start from here to understand at the root 
what data will need to be collected to meet certain set purposes. 


5. CASE STUDY: L’AQUILA 


The application for the proposed framework is the municipality of L'Aquila, in Abruzzo region, Italy. L'Aquila is 
a municipality that well represents the previously mentioned concept of complexity. First of all, it is a city that is 
located in an earthquake zone and periodically subjected to earthquakes that then result in subsequent states of 
emergency. L'Aquila is a historic city, and this entails two fundamental aspects: on the one hand, the city's housing 
stock is likely to be more fragile, less resilient to change, and with historical and artistic value to be preserved at 
the same time. On the other hand, its being an ancient city means that some infrastructure especially in the city 
center has drawbacks: narrow streets, uneven paving. All these aspects put together make the DT paradigm a very 
efficient and comprehensive method for dealing with problems. 


The case we wanted to focus on is the implementation of a digital twin for safety support at construction sites and 
more specifically with regard to prop removal operations. In the most common post-earthquake scenarios, there is 
a frequent need to realize temporary shoring mainly for safety purposes, related to the use of the buildings 
themselves and of the adjacent public spaces (Fig. 4). Despite being conceived as temporary, the shoring is 
frequently intended to last more than a decade, often without undergoing revision or maintenance. Frequently they 
also create hindrance to the viability and free road travel because they occupy portions of public land. It is for this 
reason that their removal for the execution of recovery operations is a particularly delicate phase. This first 
demonstrator aims to test the ability of the proposed framework to integrate urban impact models of a construction 
site taking into account criticalities related to the removal of temporary shoring. The dismantling of this system is 
of great interest because it determines significant impacts in reconstruction sites, both in terms of operators' safety 
and organization and management. 


The proposed DT will work by projecting short- and medium-term forecasts based on data streams generated by 
monitoring networks. These networks enable early warning strategies based on real-time predictive analysis of 
collected data, while also performing high-speed and qualitative simulations to assess specific risks and behaviors. 
The sensor network defines a fourth-level monitoring system, that is, a system that can estimate the location and 
extent of damage and use this data to determine the state of the overall structure, and thus its level of safety. The 
reference models are a simplified version of the FEM model simulator, linked to the HBIM model of the building 
aggregate, or structurally connected set of buildings. The DT thus implemented will work on three types of data 
sources: [a] sensor data from a network of sensors installed in the building aggregate and on temporary shoring, 
with the intent of monitoring relative displacements of structural elements, subsidence, and deformations, and [b] 
data communicated in real time through the sensors, arising from neighboring construction sites related to ongoing 
activities and positioning, vehicles and thus additional vibrations generated, and [c] contextual data communicated 
in real time related to urban processes that may interfere with or be jeopardized by the simple execution of 
temporary shoring removals or collapses. 


This will enable the DT to provide valuable support during the reconstruction phase when the dismantling of 
temporary shoring may lead to the definition of deformation close to sudden collapse conditions. At the same time, 
the DT will integrate and interpret the data from the sensor network based on the three previously identified sources 
and generate an early warning if the limit values are exceeded. In this way, the decision maker will be informed 
and will be able to activate the necessary measures: stop work, if necessary, and evacuate the surrounding area. In 
addition to this, the DT integrated in a communication platform with the population will provide information about 
the works in progress so as to provide support to the citizen. Indeed, the latter could change his or her route by car 
if the site's workings block the passage or his or her walking route also in relation to the possible dust emissions 
reported by the DT. Finally, the continuous application of DT during the management and operation of the building 
stock could facilitate the transition from scheduled maintenance to an "on-demand" approach, in which the 
building itself communicates the necessary maintenance actions. Specifically, during the execution of 
consolidation work, the building aggregate will be equipped with a network of sensors aimed at conducting 
structural monitoring. The collected data are exploited to verify the performance of the buildings over time, 
allowing a continuous assessment of their safety and the opportunity to plan appropriate rehabilitation activities to 
reduce their vulnerability. The results in terms of process innovation introduced by the proposed research will 
support actors involved in the reconstruction and management of smaller historic centers, particularly site safety 
coordinators to coordinate and plan removal work. 
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SECTION F - DIGITAL TWIN 
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Fig. 4: Examples of shoring in HBIM for a historic building. 


6. CONCLUSION 


This paper presents a framework for implementing a digital twin in an urban environment that by its definition is 
characterized by complexity. The scenario chosen is that of post-earthquake emergency management and 
subsequent reconstruction phases. The configuration of an integrated system such as DTs that collect real-time 
data directly from the physical context allows for stigmergic systems. These imply that it is directly the 
environment, or rather the agents within it, that leave information on the ground for other agents. Inolte the DT 
through real-time simulation systems and data analysis is able to provide short- and medium-term forecasts that 
ensure decision support, both for stakeholders involved in land management, for those responsible for the 
construction and repair phases, and for the community. 
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ABSTRACT: This study discusses the classification of Digital Twins (DTS) and their use in the Architecture, 
Engineering, Construction, and Operations (AECO) industry, the differences between building information 
modeling (BIM) and DT are emphasized and platforms for implementing DTs are compared. DTs are quickly 
gaining traction in the AECO industry because they create the ability to interact virtually with all physical smart 
devices in the built environment. The need for replicas goes all the way back to the 1960s, when NASA created 
physical replicas of spaceships and connected them to simulators to develop workshop solutions on the ground. 
DTs are simply building blocks of the metaverse that act as a real-time digital copy of a physical object. Based on 
data from the physical asset or system, the physical twin (PT), a DT unlocks value in supporting smart decision- 
making by combining artificial intelligence (AI) with the internet of things (IoT). 


KEYWORDS: Digital Twins; Internet of Things; Artificial Intelligence; Asset Management. 


1. INTRODUCTION 


Digital Twins (DTs) are quickly gaining traction in the AECO industry are quickly becoming synonymous with 
smart cities because they create the ability to interact virtually with all physical smart devices. DTs allow us to 
integrate in a physical environment data and information on what is happening in that environment thus converting 
that physical environment into a virtual one, that can be used in real time, or in aggregate, to facilitate the analyses 
of physical spaces. 


In the AECO industry, the focus on DTs has increased due to the proliferation of digitalization and integration 
processes. For example, there has been a marked growth in the development of Building Information Modeling 
(BIM), but while BIM has instigated the digitalization and integration of design and construction information, its 
utility for smart decision-making in the postconstruction stage is limited as the data and information captured in 
BIM outputs are static. Consequently, the need for technology that enables dynamic optimization of BIM data and 
operational data has grown. DTs have shown early promise in the AECO industry to elevate BIM from static to 
actionable and dynamic virtual models. 


2. BACKGROUND 


The path from building information modeling (BIM) to DTs involves the integration of multiple data sources (see 
Fig. 1). 
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Fig. 1: Essential components to create a DT of a building (Adapted from Khajavi et al. 2019) 
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2.1 Building Information Modeling (BIM) 


BIM as a digital technology continues to stimulate new workflows in the AECO industry. The capabilities of BIM 
to generate three-dimensional visualizations, and data rich models have resulted in its extension to facility lifecycle 
activities including facility management, operations and maintenance (O&M), commissioning and close-out, 
energy management, and space management, all key aspects of FM (Becerik-Gerber et al. 2012). Major progress 
has been made in the efficient transfer of design and construction data to FM systems, e.g., Computerized 
Maintenance Management Systems (CMMS), by using open standards, e.g., Construction Operation Building 
information exchange (COBie). Identification and specification of the required data during the facility design stage 
and requiring appropriate BIM deliverables is essential to developing models which are beneficial to facility 
managers. The data allows facility managers to analyze operational data while allowing owners a complete view 
of their assets (Asare et al. 2021). 


Selecting the proper BIM level of development (LOD) is crucial in successfully developing a DT. The LODs for 
sharing building information models with project participants, as prescribed in the American Institute of Architects 
(AIA) E202 contract (AIA 2022), range from LOD 100 to LOD 500 (see Table 1). BIM, specifically the LOD 500 
model, is the foundation of the Existence DT. The next step in the integration of BIM, FM and O&M lies in the 
development of DTs. 


Table 1. AIA E202 LODs (AIA 2022) 
§ Levels of Development (LOD) 


4.2 100: The Model Element may be graphically represented in the Model with a symbol or other 
generic representation. but does not satisfy the requirements for LOD 200. Information 
related to the Model Element (e.g., cost per square foot. tonnage of HVAC, etc.) can be 


delivered from other Model Elements. 


4.3 200: The Model Element is generically and graphically represented within the Model with 


approximate quantity, size, shape, location, and orientation. 


4.4 300: The Model Element, as designed, is graphically represented within the Model such that its 


quantity, size, shape, location, and orientation can be measured. 


4.4.1 | 350: The Model Element, as designed, is graphically represented within the Model such that its 
quantity, size, shape, location, orientation, and interfaces with adjacent or dependent Model 


Elements can be measured. 


4.5 400: The Model Element is graphically represented within the Model with detail sufficient for 


fabrication, assembly, and installation. 


4.6 500: The Model Element is a graphic representation of an existing or as-constructed condition 
developed through a combination of observation, field verification, or interpolation. The level 


of accuracy shall be noted or attached to the Model Element. 


2.2 Internet of Things (IoT) 


The internet of things (IoT) presents us with opportunities for transforming work and everyday life. The IoT is at 
the intersection of the physical and digital worlds, with all kinds of devices harnessing the power of 
interconnectivity to provide seamless experiences for businesses and consumers alike. To reach its full potential, 
IoT has to shift from continuing to provide incremental value amid siloed clusters to unlock its vast potential value 
as a fully interconnected IoT ecosystem. This will require an integrated IoT network within and across all industries. 
The main obstacle to be confronted is the cybersecurity risk which detrimentally impacts the trust needed to 
integrate IoT applications and networks. For smart cities, as with other applications, the expected solution lies in 
the merging of IoT and cybersecurity to form a new, integrated system (Greer et al. 2019). 
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2.3 Digital Twins (DTs) 


ADT is comprised of three principal parts: a physical system in real space, the physical twin (PT); a virtual system 
in cyberspace, the DT; and the connection between real and cyber space for transferring data and information using 
cyber-physical systems and the internet of things (CPS/IoT). A DT creates an accurate digital model of the physical 
system in cyberspace that can accurately replicate and simulate the behavior of the PT. According to Tao et al. 
(2018), a DT can also provide a digital footprint of products by integrating geometry, structure, behavior, rules and 
functional properties. Salvador Palau et al. (2019) noted that DTs can also be considered as intelligent agents with 
prediction, communication, and data preprocessing capabilities. 


Kritzinger et al. (2018) distinguished the various digital forms and categorized them as digital model, digital 
shadow and DT based on the automated dataflow between them (see Fig. 2). The Digital Model is a digital 
representation of an existing or planned physical object that does not use automated data exchange between the 
physical object and the digital object. Changes in the state of the physical object have no direct impact on the 
digital object and vice versa. The Digital Shadow is derived from the Digital Model and represents one-way data 
flow between the state of an existing physical object and a digital object. A change in state of the physical object 
leads to a change of state in the digital object, but not vice versa. The DT is characterized by automated bi- 
directional data flow between the physical and digital objects, which possess intelligence and decision-making 
capabilities that enable the automated feedback loop to the physical entity. A change in state of the physical object 
directly results in a change in state of the digital object and vice versa. 


The combination of the PT and its corresponding DT is the fundamental building block of fully connected and 
flexible systems that are able to learn and adapt to new demands. Some of the DT roles include remote monitoring, 
predictive analytics, simulating future behavior, and optimization. To fulfil these roles, DTs rely on certain 
capabilities that exist across all of them. These required DT capabilities are summarized as follows (Redelinghuys 
et al. 2020): 


e Acquire PT state - The DT must be able to acquire data from a variety of sensor types (e.g., temperature, 
pressure and vibration sensors and counters or PLC registers) from the PT. The PT sensor data collected 
is refined and enriched (e.g., through combination and adding context) into information sets that describe 
the state of the PT. 


Digital 
Digital Model 


Object 


> Manual Data Flow 
— > Automatic Data 


Digital 
Digital Shadow 


Object 


---» Manual Data Flow 
—> Automatic Data 


Digital 
Digital Twin 


Object 
to = —> Automatic Data 


---» Manual Data Flow 


Fig. 2: Evolution from Digital Forms to Digital Twins (Adapted from Kritzinger et al. 2018). 
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e Maintain Information Repository - The state information obtained from the PT sensors is stored for easy 
access through the internet. This repository typically relies on Cloud-based storage, since large volumes 
of data may be stored for long periods of time. 


e Simulate operation - The simulation of the PT’s operation, i.e., predicting its future behavior from a given 
starting state and selected set of conditions, is required for some of the envisioned roles of the DT 
including the evaluation of new processes and different operation schedules. 


e Emulate operation - Using emulation to imitate and visually represent or reproduce the action or function 
of the PT in real-time using feedback from embedded sensors. 


Changes in the physical process will impact the digital world through the feedback of real-time embedded sensors 
and actuators. Using this data feedback, digital models can be used to interpret the behavior of machines or systems, 
and predict future state from real-time and historical data, as well as experience and knowledge. The core elements 
of a DT are the models and data. CPS, and the technologies required for developing CPS, are considered as a 
necessary foundation for implementing DTs. 


Major challenges to adopting DTs include global connectivity, data integration and interoperability, data 
standardization, security and integrity, real-time performance and reliability, as well as barriers to its 
implementation and legacy system transformation (Attaran and Celik 2023). These challenges play a fundamental 
role in the development of DTs, as the connections between the PT and its corresponding DT typically rely on 
internet enabled connectivity. 


3. CLASSIFICATIONS AND LEVELS OF MATURITY 


There is not one single definition for what a DT is or the capabilities it provides. There are numerous types of DTs 
and levels of functionality based on the needs or the organization or project and the maturity of the data available. 
The development of a DT is a continuum, with the model evolving with the addition of new data and capabilities, 
the following classifications were developed by KPMG (2022) to indicate the level of functionality of the DT 
based on the types and level of data and capabilities that the system provides (see Figure 3): 


e Existence twin: Furnishes principal project information, e.g., details on asset location and properties, 
enabling a single source of truth for asset data across the project. Traditional CAD and BIM systems are 
examples of an existence twin. This DT definition differs from that of the Kritzinger et al. (2018) 
classification of the evolution from digital forms to DTs. 


e Status twin: Provides information on the status and condition of assets as detected by embedded IoT 
sensors. This can provide important insights into construction quality and progress, as well as asset health 
status over its lifecycle, and enables prediction of future performance based on the data collected. 


e Operational twin: Allows for a real-time view of the project and the operational asset. This can provide 
critical insights into real-time performance and risks, both during construction and operations, and it 
enables more informed decision-making. 


e Simulation twin: Enables teams to assess the impact of different design, construction, and operational 
decisions, allowing optimization of improvements in cost, performance, and risk. A simulation twin 
enables better and more thorough planning and can help minimize the risk of costly design and 
construction errors and faulty operational control changes. 


e Cognitive twin: Uses AI and real-time data collection to analyze data, make decisions and optimize 
operational performance. This enables refinements made in real-time based on live data to adapt to 
existing conditions. 


These classifications imply that an organization does not need to develop a highly advanced and complex model 
to see value from investing in DT technologies. For example, existence and status DTs can provide important 
project management insights on the physical configuration, properties, budget and cost, and as-built condition of 
a project and its component assets. 
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Figure 4. Data and DT Maturity (Adapted from KPMG 2022) 


4. DIFFERENCES BETWEEN BIM AND DT 


The building information model focuses on replicating the physical asset throughout the asset’s lifecycle, while 
the DT replicates and enables a connection with the physical asset during operations. In BIM, there is total split 
between the digital and the physical asset, while, for DTs this split is distorted due to asset instrumentation and 
data synchronization between the physical and the digital asset. BIM is used in the three main phases of the built 
asset’s lifecycle, design, build, and operations (Brilakis et al. 2019), while the DT is focused primarily on 
operations. These models have varying degrees of detail for their specific use-cases, i.e., built asset design, design- 
construction coordination, optimal asset delivery, and facility management. For DTs, there are no standard 
specifications for model detail or fidelity available. BIM has limited support for asset monitoring and control and 
for asset performance simulations during operations, while the DT does not consider discipline coordination for 
built asset delivery (Delgado and Oyedele 2021). 


5. PLATFORMS FOR IMPLEMENTING DTs 
There are three categories of data platforms for implementing DTs (Adamenko et al. 2020): 


1. IoT Platforms: They provide data connectivity between the real and virtual world. Typically equipped 
with resources that establish connection between networked devices and the applications that process 
and/or visualize the data. Examples include Azure Digital Twins and IoT, Amazon Web Services IoT 
TwinMaker, Siemens MindSphere and Eclipse. DT design with such tools is more data-based. They 
provide a user interface (UI) for data modeling. 


2. Gaming Engines: These platforms facilitate the development of executable video game-like applications. 
Their high-end visualization capabilities can be combined with IoT Platforms to achieve user-friendly DT 
applications. Examples include Unreal Engine, and Unity 3D. They require extensive programming to 
model DT data. 


3. Commercial modeling and Simulation Platforms: These tools typically support design and 
implementation of system-based DTs. Examples include ANSYS, Autodesk Tandem, NVIDIA 
Omniverse, Microsoft Azure and Unreal Engine. They each provide a user interface (UI) for data 
modeling, e.g., Digital Twin Definition Language (DTDL) based on JSON-LD for Azure models, and the 
Universal Scene Description (USD) language for the Omniverse platform. 


6. CONCLUSION 


DTs provide a new outlook for smart decision-making in the AECO industry. From an operations and maintenance 
perspective, DTs can provide a dynamic view of facility status, enable operational control, support scenario 
planning and testing, and afford overall operational intelligence. To achieve high-performing DTs for smart 
decision-making, it is important to develop high-quality BIM outputs and supplement them with high-fidelity data. 
This requires a clear understanding of the different types of DTs, the tools for developing them, as well as knowing 
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how and where to apply DTs. A macro-level roadmap to transforming BIM to DTs has been presented. As adoption 
of DTs increases, it is important to address the issues of standardization of DT design and implementation. This 
includes testing implementation tools and methods towards identifying the creation of DTs that can truly make 
decision-making in the AECO industry smarter. 
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Wi the overarching theme of “Managing the Digital Transformation of Construc- 
tion Industry” the 23rd International Conference on Construction Applications of 
Virtual Reality (CONVR 2023) presented 123 high-quality contributions on the topics of: 
Virtual and Augmented Reality (VR/AR), Building Information Modeling (BIM), Simulation and 
Automation, Computer Vision, Data Science, Artificial Intelligence, Linked Data, Semantic Web, 
Blockchain, Digital Twins, Health & Safety and Construction site management, Green buildings, 
Occupant-centric design and operation, Internet of Everything. The editors trust that this pu- 
blication can stimulate and inspire academics, scholars and industry experts in the field, 
driving innovation, growth and global collaboration among researchers and stakeholders. 
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